Fetal aneuploidy detection by sequencing

ABSTRACT

The present invention provides apparatus and methods for enriching components or cells from a sample and conducting genetic analysis, such as SNP genotyping to provide diagnostic results for fetal disorders or conditions.

CROSS-REFERENCE

This application claims the benefit of U.S. Provisional Application No.60/804,816, filed Jun. 14, 2006, which application is incorporatedherein by reference.

BACKGROUND OF THE INVENTION

Analysis of specific cells can give insight into a variety of diseases.These analyses can provide non-invasive tests for detection, diagnosisand prognosis of diseases, thereby eliminating the risk of invasivediagnosis. For instance, social developments have resulted in anincreased number of prenatal tests. However, the available methodstoday, amniocentesis and chorionic villus sampling (CVS) are potentiallyharmful to the mother and to the fetus. The rate of miscarriage forpregnant women undergoing amniocentesis is increased by 0.5-1%, and thatfigure is slightly higher for CVS. Because of the inherent risks posedby amniocentesis and CVS, these procedures are offered primarily toolder women, i.e., those over 35 years of age, who have a statisticallygreater probability of bearing children with congenital defects. As aresult, a pregnant woman at the age of 35 has to balance an average riskof 0.5-1% to induce an abortion by amniocentesis against an age relatedprobability for trisomy 21 of less than 0.3%.

Some non-invasive methods have already been developed to diagnosespecific congenital defects. For example, maternal serumalpha-fetoprotein, and levels of unconjugated estriol and humanchorionic gonadotropin can be used to identify a proportion of fetuseswith Down's syndrome, however, these tests are not one hundred percentaccurate. Similarly, ultrasonography is used to determine congenitaldefects involving neural tube defects and limb abnormalities, but isuseful only after fifteen weeks' gestation.

The methods of the present invention allow for the detection of fetalcells and fetal abnormalities when fetal cells are mixed with apopulation of maternal cells, even when the maternal cells dominate themixture.

SUMMARY OF THE INVENTION

The presence of fetal cells within the blood of pregnant women offersthe opportunity to develop a prenatal diagnostic that replacesamniocentesis and thereby eliminates the risk of today's invasivediagnosis. However, fetal cells represent a small number of cellsagainst the background of a large number of maternal cells in the bloodwhich make the analysis time consuming and prone to error. Currenttechnologies and protocols for highly parallel SNP detection with DNAmicroarray readout result in inaccurate calls when there are too fewstarting DNA copies or when a particular allele represents a smallfraction in the population of input DNA molecules.

The present invention relates to methods for detecting a fetalabnormality by determining the ratio of the abundance of one or morematernal alleles to the abundance of one or more paternal alleles in thegenomic DNA of a sample. The genomic region includes a single nucleotidepolymorphism (SNP), which can preferably be an informative SNP. The SNPcan be detected by methods that include using a DNA microarray, beadmicroarray, or high throughput sequencing. In some embodiments,determining the ratio involves detecting an abundance of a nucleotidebase at a SNP position. In other embodiments, determining the ratio alsocomprises calculating error rate based amplification. Prior todetermining the abundance of allele(s), the sample can be enriched forfetal cells.

The method of detection is provided by highly parallel SNP detectionthat can be used to determine the ratios of abundance of maternal andpaternal alleles at a plurality of genomic regions present in thesample. In some embodiments, the ratios of abundance are determined inat least 100 genomic regions, which can comprise a single locus,different loci, a single chromosome, or different chromosomes. In someembodiments, a first genomic region (SNP) analyzed is in a genomicregion suspected of being trisomic or is trisomic and a second genomicregion (SNP) analyzed is in a non-trisomic region or a region suspectedof being non-trisomic. The ratio of alleles in the first genomic regioncan then be compared to the ratio of alleles in the second genomicregion, and in some embodiments, the comparison is made by determiningthe difference in the means of the ratios in the first and secondgenomic regions. An increase in paternal abundance can be indicative ofpaternal trisomy, while an increase in maternal abundance can beindicative of maternal trisomy. Alternatively, an increase in paternalabundance or maternal abundance of one or more alleles is indicative ofpartial trisomy. The first and second genomic regions can be on the sameor different chromosomes.

In an embodiment, the invention provides for a method for detecting afetal abnormality comprising comparing an abundance of one or morematernal alleles in a first genomic region in a maternal blood sample,where said genomic region is suspected of trisomy with an abundance ofone or more maternal alleles in a second genomic region in said bloodsample wherein said second genomic region is non-trisomic. Up to 20 mlof blood can be used to detect the fetal abnormality. The first genomicregion that is suspected of trisomy and the second genomic region thatis a non-trisomic region can each be present on chromosomes 13, 18. 21and on the X chromosome.

In some embodiments, a ratio of the abundance of the maternal alleles inthe first genomic region to the abundance of the maternal alleles in thesecond genomic region can be determined and compared to a second ratioobtained for a control sample. The control sample can comprise a dilutedportion of the maternal sample, which can be diluted by a factor of atleast 1,000.

In some embodiments, detecting the fetal abnormality further involvesestimating the number of fetal cells present in the maternal sample.This can be performed by, e.g., ranking the alleles detected accordingto their abundance. The ranking can then be used to determine anabundance of one or more paternal alleles. In some embodiments, datamodels can be fitted for optimal detection of aneuploidy. The methodsherein can be used to identify monoploidy, triploidy, tetraploidy,pentaploidy and other multiples of the normal haploid state. Forexample, the data models can be used to determine estimates for thefraction of fetal cells present in a sample and for detecting a fetalabnormality or condition.

In some embodiments, the abundance of one or more paternal alleles canbe compared to the abundance of the maternal alleles at one or moregenetic regions. In other embodiments, one or more ratios of theabundance of the paternal allele(s) to the abundance of the maternalallele(s) at one or more genetic regions can be compared with anestimate fraction of fetal cells. A statistical analysis can beperformed on the one or more ratios of the abundance of paternal allelesto the abundance of the maternal alleles to determine the presence offetal DNA in the sample with a level of confidence that exceeds 90%.

INCORPORATION BY REFERENCE

All publications and patent applications mentioned in this specificationare herein incorporated by reference to the same extent as if eachindividual publication or patent application was specifically andindividually indicated to be incorporated by reference.

BRIEF DESCRIPTION OF THE DRAWINGS

The novel features of the invention are set forth with particularity inthe appended claims. A better understanding of the features andadvantages of the present invention will be obtained by reference to thefollowing detailed description that sets forth illustrative embodiments,in which the principles of the invention are utilized, and theaccompanying drawings of which:

FIG. 1 illustrates an overview of the process of the invention.

FIGS. 2A-2D illustrates one embodiment of a size-based separationmodule.

FIGS. 3A-3C illustrates one embodiment of an affinity separation module.

FIG. 4 illustrates one embodiment of a magnetic separation module.

FIG. 5 illustrates an overview for a typical parallel SNP genotypingassay.

FIG. 6 illustrates the types of SNP calls that result depicting allelestrengths at different loci.

FIG. 7 illustrates the concept of rank ordering of allele strengths.

FIG. 8 illustrates a histogram of paternal allele strength normalizedrelative to maternal alleles.

FIGS. 9A-9B illustrate cell smears of the product and waste fractions.

FIG. 10A-10F illustrate isolated fetal cells confirmed by the reliablepresence of male Y chromosome.

FIG. 11 illustrates trisomy 21 pathology in an isolated fetal nucleatedred blood cell.

FIG. 12A-12D illustrate various embodiments of a size-based separationmodule.

FIG. 13 illustrates the detection of single copies of a fetal cellgenome by qPCR.

FIG. 14 illustrates detection of single fetal cells in binned samples bySNP analysis.

FIG. 15 illustrates a method of trisomy testing. The trisomy 21 screenis based on scoring of target cells obtained from maternal blood. Bloodis processed using a cell separation module for hemoglobin enrichment(CSM-IIE). Isolated cells are transferred to slides that are firststained and subsequently probed by FISH. Images are acquired, such asfrom bright field or fluorescent microscopy, and scored. The proportionof trisomic cells of certain classes serves as a classifier for risk offetal trisomy 21. Fetal genome identification can performed using assayssuch as: (1) STR markers; (2) qPCR using primers and probes directed toloci, such as the multi-repeat DYZ locus on the Y-chromosome; (3) SNPdetection; and (4) CGH (comparative genome hybridization) arraydetection.

FIG. 16 illustrates assays that can produce information on the presenceof aneuploidy and other genetic disorders in target cells. Informationon anueploidy and other genetic disorders in target cells may beacquired using technologies such as: (1) a CGH array established forchromosome counting, which can be used for aneuploidy determinationand/or detection of intra-chromosomal deletions; (2) SNP/taqman assays,which can be used for detection of single nucleotide polymorphisms; and(3) ultra-deep sequencing, which can be used to produce partial orcomplete genome sequences for analysis.

FIG. 17 illustrates methods of fetal diagnostic assays. Fetal cells areisolated by CSM-HE enrichment of target cells from blood. Thedesignation of the fetal cells may be confirmed using techniquescomprising FISH staining (using slides or membranes and optionally anautomated detector), FACS, and/or binning. Binning may comprisedistribution of enriched cells across wells in a plate (such as a 96 or384 well plate), microencapsulation of cells in droplets that areseparated in an emulsion, or by introduction of cells into microarraysof nanofluidic bins. Fetal cells are then identified using methods thatmay comprise the use of biomarkers (such as fetal (gamma) hemoglobin),allele-specific SNP panels that could detect fetal genome DNA, detectionof differentially expressed maternal and fetal transcripts (such asAffymetrix chips), or primers and probes directed to fetal specific loci(such as the multi-repeat DYZ locus on the Y-chromosome). Binning sitesthat contain fetal cells are then be analyzed for aneuploidy and/orother genetic defects using a technique such as CGH array detection,ultra deep sequencing (such as Solexa, 454, or mass spectrometry), STRanalysis, or SNP detection.

FIG. 18 illustrates methods of fetal diagnostic assays, furthercomprising the step of whole genome amplification prior to analysis ofaneuploidy and/or other genetic defects.

DETAILED DESCRIPTION OF THE INVENTION

The methods herein are used for detecting the presence and condition offetal cells in a mixed sample wherein the fetal cells are at aconcentration of less than 90, 80, 70, 60, 50, 40, 30, 20, 10, 5 or 1%of all cells in the sample at a concentration less than 1:2, 1:4, 1:10,1:50, 1:100, 1:1000, 1:10,000, 1:100,000, 1,000,000, 1:10,000,000 or1:100,000,000 of all cells in the sample.

FIG. 1 illustrates an overview of the methods and systems herein.

In step 100, a sample to be analyzed for rare cells (e.g. fetal cells)is obtained from an animal. Such animal can be suspected of beingpregnant, pregnant, or one that has been pregnant. Such sample can beanalyzed by the systems and methods herein to determine a condition inthe animal or fetus of the animal. In some embodiments, the methodsherein are used to detect the presence of a fetus, sex of a fetus, orcondition of the fetus. The animal from whom the sample is obtained canbe, for example, a human or a domesticated animal such as a cow,chicken, pig, horse, rabbit, dog, cat, or goat. Samples derived from ananimal or human include, e.g., whole blood, sweat, tears, ear flow,sputum, lymph, bone marrow suspension, lymph, urine, saliva, semen,vaginal flow, cerebrospinal fluid, brain fluid, ascites, milk,secretions of the respiratory, intestinal or genitourinary tracts fluid.

To obtain a blood sample, any technique known in the art may be used,e.g. a syringe or other vacuum suction device. A blood sample can beoptionally pre-treated or processed prior to enrichment. Examples ofpre-treatment steps include the addition of a reagent such as astabilizer, a preservative, a fixant, a lysing reagent, a diluent, ananti-apoptotic reagent, an anti-coagulation reagent, an anti-thromboticreagent, magnetic property regulating reagent, a buffering reagent, anosmolality regulating reagent, a pH regulating reagent, and/or across-linking reagent.

When a blood sample is obtained, a preservative such an anti-coagulationagent and/or a stabilizer is often added to the sample prior toenrichment. This allows for extended time for analysis/detection. Thus,a sample, such as a blood sample, can be enriched and/or analyzed underany of the methods and systems herein within 1 week, 6 days, 5 days, 4days, 3 days, 2 days, 1 day, 12 hrs, 6 hrs, 3 hrs, 2 hrs, or 1 hr fromthe time the sample is obtained.

A blood sample can be combined with an agent that selectively lyses oneor more cells or components in a blood sample. For example, fetal cellscan be selectively lysed releasing their nuclei when a blood sampleincluding fetal cells is combined with deionized water. Such selectivelysis allows for the subsequent enrichment of fetal nuclei using, e.g.,size or affinity based separation. In another example, platelets and/orenucleated red blood cells are selectively lysed to generate a sampleenriched in nucleated cells, such as fetal nucleated red blood cells(fnRBC) and material red nucleated blood cells (mnRBC). The fnRBCs cansubsequently be separated from the mnRBCs using, e.g., antigen-iaffinity or differences in hemoglobin

When obtaining a sample from an animal (e.g., blood sample), the amountcan vary depending upon animal size, its gestation period, and thecondition being screened. Up to 50, 40, 30, 20, 10, 9, 8, 7, 6, 5, 4, 3,2, or 1 mL of a sample is obtained. The volume of sample obtained can be1-50, 2-40, 3-30, or 4-20 mL. Alternatively, more than 5, 10, 15, 20,25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95 or 100 mL ofa sample is obtained.

To detect fetal abnormality, a blood sample can be obtained from apregnant animal or human within 36, 24, 22, 20, 18, 16, 14, 12, 10, 8, 6or 4 weeks of gestation.

In step 101, a reference or control sample is obtained by any meansknown in the art. A reference sample is any sample that consistsessentially of, or only of, non-fetal cells or non-fetal DNA. Areference sample is preferably a maternal only cell or DNA sample. Insome embodiment, a reference sample is a maternal only blood sample.When obtaining a reference sample such as a maternal blood sample from apregnant female, or one suspected of being pregnant or the sample can bediluted enough to ensure that <<1 fetal cell is expected in the sample.Dilution can be by a factor of about 10 to 1000 fold, or by a factor ofgreater than 5, 10, 50, 100, 200, 500 to 1000 fold. Alternatively, whiteblood cells can be obtained from the same organism from whom the mixedsample is obtained. In some cases, the reference sample is obtained bydeleting a portion of the mixed sample.

In step 102, when the sample to be tested or analyzed is a mixed sample(e.g. maternal blood sample), it is enriched for rare cells or rare DNA(e.g. fetal cells, fetal DNA or fetal nuclei) using one or more methodsknown in the art or disclosed herein. Such enrichment increases theratio of fetal cells to non-fetal cells, the concentration of fetal DNAto non-fetal DNA, and/or the concentration of fetal cells in volume pertotal volume of the mixed sample.

In some embodiments, enrichment occurs by selective lysis as describedabove. For example, enucleated cells may be selectively lysed prior tosubsequent enrichment steps or fetal nucleated cells may be selectivelylysed prior to separation of the fetal nuclei from other cells andcomponents in the sample.

In some embodiments, enrichment of fetal cells or fetal nuclei occursusing one or more size-based separation modules. Size-based separationmodules include filtration modules, sieves, matrixes, etc., includingthose disclosed in International Publication Nos. WO 2004/113877, WO2004/0144651, and US Application Publication No. 2004/011956.

In some embodiments, a size-based separation module includes one or morearrays of obstacles that form a network of gaps. The obstacles areconfigured to direct particles (e.g. cells or nuclei) as they flowthrough the array/network of gaps into different directions or outletsbased on the particle's hydrodynamic size. For example, as a bloodsample flows through an array of obstacles, nucleated cells or cellshaving a hydrodynamic size larger than a predetermined size, e.g., 8microns, are directed to a first outlet located on the opposite side ofthe array of obstacles from the fluid flow inlet, while the enucleatedcells or cells having a hydrodynamic size smaller than a predeterminedsize, e.g., 8 microns, are directed to a second outlet also located onthe opposite side of the array of obstacles from the fluid flow inlet.

An array can be configured to separate cells smaller than apredetermined size from those larger than a predetermined size byadjusting the size of the gaps, obstacles, and offset in the periodbetween each successive row of obstacles. For example, in someembodiments, obstacles and/or gaps between obstacles can be up to 10,20, 50, 70, 100, 120, 150, 170, or 200 microns in length or about 2, 4,6, 8 or 10 microns in length. In some embodiments, an array forsize-based separation includes more than 100, 500, 1,000, 5,000, 10,000,50,000 or 100,000 obstacles that are arranged into more than 10, 20, 50,100, 200, 500, or 1000 rows. Preferably, obstacles in a first row ofobstacles are offset from a previous (upstream) row of obstacles by upto 50% the period of the previous row of obstacles. In some embodiments,obstacles in a first row of obstacles are offset from a previous row ofobstacles by up to 45, 40, 35, 30, 25, 20, 15 or 10% the period of theprevious row of obstacles. Furthermore, the distance between a first rowof obstacles and a second row of obstacles can be up to 10, 20, 50, 70,100, 120, 150, 170 or 200 microns. A particular offset can be continuous(repeating for multiple rows) or non-continuous. In some embodiments, aseparation module includes multiple discrete arrays of obstacles fluidlycoupled such that they are in series with one another. Each array ofobstacles has a continuous offset. But each subsequent (downstream)array of obstacles has an offset that is different from the previous(upstream) offset. Preferably, each subsequent array of obstacles has asmaller offset that the previous array of obstacles. This allows for arefinement in the separation process as cells migrate through the arrayof obstacles. Thus, a plurality of arrays can be fluidly coupled inseries or in parallel, (e.g., more than 2, 4, 6, 8, 10, 20, 30, 40, 50).Fluidly coupling separation modules (e.g., arrays) in parallel allowsfor high-throughput analysis of the sample, such that at least 1, 2, 5,10, 20, 50, 100, 200, or 500 mL per hour flows through the enrichmentmodules or at least 1, 5, 10, or 50 million cells per hour are sorted orflow through the device.

FIGS. 2A-2D illustrates an example of a size-based separation module.Obstacles (which may be of any shape) are coupled to a flat substrate toform an array of gaps. A transparent cover or lid may be used to coverthe array. The obstacles form a two-dimensional array with eachsuccessive row shifted horizontally with respect to the previous row ofobstacles, where the array of obstacles directs component having ahydrodynamic size smaller than a predetermined size in a first directionand component having a hydrodynamic size larger that a predeterminedsize in a second direction. The flow of sample into the array ofobstacles can be aligned at a small angle (flow angle) with respect to aline-of-sight of the array. Optionally, the array is coupled to aninfusion pump to perfuse the sample through the obstacles. The flowconditions of the size-based separation module described herein are suchthat cells are sorted by the array with minimal damage. This allows fordownstream analysis of intact cells and intact nuclei to be moreefficient and reliable.

In one embodiment, a size-based separation module comprises an array ofobstacles configured to direct fetal cells larger than a predeterminedsize to migrate along a line-of-sight within the array towards a firstoutlet or bypass channel leading to a first outlet, while directingcells and analytes smaller than a predetermined size through the arrayof obstacles in a different direction towards a second outlet.

A variety of enrichment protocols may be utilized although, in mostembodiments, gentle handling of the cells is needed to reduce anymechanical damage to the cells or their DNA. This gentle handling alsopreserves the small number of fetal cells in the sample. Integrity ofthe nucleic acid being evaluated is an important feature to permit thedistinction between the genomic material from the fetal cells and othercells in the sample. In particular, the enrichment and separation of thefetal cells using the arrays of obstacles produces gentle treatmentwhich minimizes cellular damage and maximizes nucleic acid integritypermitting exceptional levels of separation and the ability tosubsequently utilize various formats to very accurately analyze thegenome of the cells which are present in the sample in extremely lownumbers.

In some embodiments, enrichment of fetal cells occurs using one or morecapture modules that selectively inhibit the mobility of one or morecells of interest. Preferable a capture module is fluidly coupleddownstream to a size-based separation module. Capture modules caninclude a substrate having multiple obstacles that restrict the movementof cells or analytes greater than a predetermined size. Examples ofcapture modules that inhibit the migration of cells based on size aredisclosed in U.S. Pat. Nos. 5,837,115 and 6,692,952.

In some embodiments, a capture module includes a two dimensional arrayof obstacles that selectively filters or captures cells or analyteshaving a hydrodynamic size greater than a particular gap size, e.g.,predetermined size. Arrays of obstacles adapted for separation bycapture can include obstacles having one or more shapes and can bearranged in a uniform or non-uniform order. In some embodiments, atwo-dimensional array of obstacles is staggered such that eachsubsequent row of obstacles is offset from the previous row of obstaclesto increase the number of interactions between the analytes being sorted(separated) and the obstacles.

Another example of a capture module is an affinity-based separationmodule. An affinity-based separation module captures analytes or cellsof interest based on their affinity to a structure or particle asopposed to their size. One example of an affinity-based separationmodule is an array of obstacles that are adapted for complete sampleflow through, but for the fact that the obstacles are covered withbinding moieties that selectively bind one or more analytes (e.g., cellpopulation) of interest (e.g., red blood cells, fetal cells, ornucleated cells) or analytes not-of-interest (e.g., white blood cells).Binding moieties can include e.g., proteins (e.g., ligands/receptors),nucleic acids having complementary counterparts in retained analytes,antibodies, etc. In some embodiments, an affinity-based separationmodule comprises a two-dimensional array of obstacles covered with oneor more antibodies selected from the group consisting of: anti-CD71,anti-CD235a, anti-CD36, anti-carbohydrates, anti-selectin, anti-CD45,anti-GPA, and anti-antigen-i.

FIG. 3A illustrates a path of a first analyte through an array of postswherein an analyte that does not specifically bind to a post continuesto migrate through the array, while an analyte that does bind a post iscaptured by the array. FIG. 3B is a picture of antibody coated posts.FIG. 3C illustrates coupling of antibodies to a substrate (e.g.,obstacles, side walls, etc.) as contemplated by the present invention.Examples of such affinity-based separation modules are described inInternational Publication No. WO 2004/029221.

In some embodiments, a capture module utilizes a magnetic field toseparate and/or enrich one or more analytes (cells) that has a magneticproperty or magnetic potential. For example, red blood cells which areslightly diamagnetic (repelled by magnetic field) in physiologicalconditions can be made paramagnetic (attracted by magnetic field) bydeoxygenation of the hemoglobin into methemoglobin. This magneticproperty can be achieved through physical or chemical treatment of thered blood cells. Thus, a sample containing one or more red blood cellsand one or more non-red blood cells can be enriched for the red bloodcells by first inducing a magnetic property and then separating theabove red blood cells from other analytes using a magnetic field(uniform or non-uniform). For example, a maternal blood sample can flowfirst through a size-based separation module to remove enucleated cellsand cellular components (e.g., analytes having a hydrodynamic size lessthan 6 μm) based on size. Subsequently, the enriched nucleated cells(e.g., analytes having a hydrodynamic size greater than 6 μm) whiteblood cells and nucleated red blood cells are treated with a reagent,such as CO₂, N₂ or NaNO₂, that changes the magnetic property of the redblood cells' hemoglobin. The treated sample then flows through amagnetic field (e.g., a column coupled to an external magnet), such thatthe paramagnetic analytes (e.g., red blood cells) will be captured bythe magnetic field while the white blood cells and any other non-redblood cells will flow through the device to result in a sample enrichedin nucleated red blood cells (including fnRBC's). Additional examples ofmagnetic separation modules are described in U.S. application Ser. No.11/323,971, filed Dec. 29, 2005 entitled “Devices and Methods forMagnetic Enrichment of Cells and Other Particles” and U.S. applicationSer. No. 11/227,904, filed Sep. 15, 2005, entitled “Devices and Methodsfor Enrichment and Alteration of Cells and Other Particles”.

Subsequent enrichment steps can be used to separate the rare cells (e.g.fnRBC's) from the non-rare maternal nucleated red blood cells(non-RBC's). In some embodiments, a sample enriched by size-basedseparation followed by affinity/magnetic separation is further enrichedfor rare cells using fluorescence activated cell sorting (FACS) orselective lysis of a subset of the cells (e.g. fetal cells). In someembodiments, fetal cells are selectively bound to an anti-antigen ibinding moiety (e.g. an antibody) to separate them from the mnRBC's. Insome embodiments, fetal cells or fetal DNA is distinguished fromnon-fetal cells or non-fetal DNA by forcing the rare cells (fetal cells)to become apoptotic, thus condensing their nuclei and optionallyejecting their nuclei. Rare cells such as fetal cells can be forced intoapoptosis using various means including subjecting the cells tohyperbaric pressure (e.g. 4% CO₂). The condensed nuclei can be detectedand/or isolated for further analysis using any technique known in theart including DNA gel electrophoresis, in situ labeling of DNA nicks(terminal deoxynucleotidyl transferase (TdT))-mediated dUTP in situ nicklabeling (also known as TUNEL) (Gavrieli, Y., et al. J. Cell Biol119:493-501 (1992)) and ligation of DNA strand breaks having one ortwo-base 3′ overhangs (Taq polymerase-based in situ ligation). (DidenkoV., et al. J. Cell Biol. 135:1369-76 (1996)).

In some embodiments, when the analyte desired to be separated (e.g., redblood cells or white blood cells) is not ferromagnetic or does not havea magnetic property, a magnetic particle (e.g., a head) or compound(e.g., Fe³⁺) can be coupled to the analyte to give it a magneticproperty. In some embodiments, a bead coupled to an antibody thatselectively binds to an analyte of interest can be decorated with anantibody elected from the group of anti CD71 or CD75. In someembodiments a magnetic compound, such as Fe³⁺, can be coupled to anantibody such as those described above. The magnetic particles ormagnetic antibodies herein may be coupled to any one or more of thedevices described herein prior to contact with a sample or may be mixedwith the sample prior to delivery of the sample to the device(s).

The magnetic field used to separate analytes/cells in any of theembodiments herein can uniform or non-uniform as well as external orinternal to the device(s) herein. An external magnetic field is onewhose source is outside a device herein (e.g., container, channel,obstacles). An internal magnetic field is one whose source is within adevice contemplated herein. An example of an internal magnetic field isone where magnetic particles may be attached to obstacles present in thedevice (or manipulated to create obstacles) to increase surface area foranalytes to interact with to increase the likelihood of binding.Analytes captured by a magnetic field can be released by demagnetizingthe magnetic regions retaining the magnetic particles. For selectiverelease of analytes from regions, the demagnetization can be limited toselected obstacles or regions. For example, the magnetic field can bedesigned to be electromagnetic, enabling turn-on and turn-off of themagnetic fields for each individual region or obstacle at will.

FIG. 4 illustrates an embodiment of a device configured for capture andisolation of cells expressing the transferrin receptor from a complexmixture. Monoclonal antibodies to CD71 receptor are readily availableoff-the-shelf and can be covalently coupled to magnetic materials, suchas, but not limited to any conventional ferroparticles including ferrousdoped polystyrene and ferroparticles or ferro-colloids (e.g., fromMiltenyi or Dynal). The anti CD71 bound to magnetic particles is flowedinto the device. The antibody coated particles are drawn to theobstacles (e.g., posts), floor, and walls and are retained by thestrength of the magnetic field interaction between the particles and themagnetic field. The particles between the obstacles, and those looselyretained with the sphere of influence of the local magnetic fields awayfrom the obstacles, are removed by a rinse.

One or more of the enrichment modules herein (e.g., size-basedseparation module(s) and capture module(s)) may be fluidly coupled inseries or in parallel with one another. For example a first outlet froma separation module can be fluidly coupled to a capture module. In someembodiments, the separation module and capture module are integratedsuch that a plurality of obstacles acts both to deflect certain analytesaccording to size and direct them in a path different than the directionof analyte(s) of interest, and also as a capture module to capture,retain, or bind certain analytes based on size, affinity, magnetism orother physical property.

In any of the embodiments herein, the enrichment steps performed have aspecificity and/or sensitivity 60, 70, 80, 90, 95, 96, 97, 98, 99, 99.1,99.2, 99.3, 99.4, 99.5, 99.6, 99.7, 99.8, 99.9 or 99.95% The retentionrate of the enrichment module(s) herein is such that ≧50, 60, 70, 80,90, 91, 92, 93, 94, 95, 96, 97, 98, 99, or 99.9% of the analytes orcells of interest (e.g., nucleated cells or nucleated red blood cells ornucleated from red blood cells) are retained. Simultaneously, theenrichment modules are configured to remove ≧50, 60, 70, 80, 85, 90, 91,92, 93, 94, 95, 96, 97, 98, 99, or 99.9% of all unwanted analytes (e.g.,red blood-platelet enriched cells) from a sample.

Any or all of the enrichment steps can occur with minimal dilution ofthe sample. For example, in some embodiments the analytes of interestare retained in an enriched solution that is less than 50, 40, 30, 20,10, 9.0, 8.0, 7.0, 6.0, 5.0, 4.5, 4.0, 3.5, 3.0, 2.5, 2.0, 1.5, 1.0, or0.5 fold diluted from the original sample. In some embodiments, any orall of the enrichment steps increase the concentration of the analyte ofinterest (e.g. fetal cell), for example, by transferring them from thefluid sample to an enriched fluid sample (sometimes in a new fluidmedium, such as a buffer). The new concentration of the analyte ofinterest may be at least 2, 4, 6, 8, 10, 20, 50, 100, 200, 500, 1,000,2,000, 5,000, 10,000, 20,000, 50,000, 100,000, 200,000, 500,000,1,000,000, 2,000,000, 5,000,000, 10,000,000, 20,000,000, 50,000,000,100,000,000, 200,000,000, 500,000,000, 1,000,000,000, 2,000,000,000, or5,000,000,000 fold more concentrated than in the original sample. Forexample, a 10 times concentration increase of a first cell type out of ablood sample means that the ratio of first cell type/all cells in asample is 10 times greater after the sample was applied to the apparatusherein. Such concentration can take a fluid sample (e.g., a bloodsample) of greater than 10, 15, 20, 50, or 100 mL total volumecomprising rare components of interest, and it can concentrate such rarecomponent of interest into a concentrated solution of less than 0.5, 1,2, 3, 5, or 10 mL, total volume.

The final concentration of rare cells in relation to non-rare cellsafter enrichment can be about 1/10,000- 1/10, or 1/1,000- 1/100. In someembodiments, the concentration of fetal cells to maternal cells may beup to 1/1,000, 1/100, or 1/10 or as low as 1/100, 1/1,000 or 1/10,000.

Thus, detection and analysis of the fetal cells can occur even if thenon-fetal (e.g. maternal) cells are >50%, 60%, 70%, 80%, 90%, 95%, or99% of all cells in a sample. In some embodiments, fetal cells are at aconcentration of less than 1:2, 1:4, 1:10, 1:50, 1:100, 1:1000,1:10,000, 1:100,000, 1,000,000, 1:10,000,000 or 1:100,000,000 of allcells in a mixed sample to be analyzed or at a concentration of lessthan 1×10⁻³, 1×10⁻⁴, 1×10⁻⁵, 1×10⁻⁶, or 1×10⁻⁶ cells/μL of the mixedsample. Over all, the number of fetal cells in a mixed sample, (e.g.enriched sample) has up to 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20, 30,40, 50, 100 total fetal cells.

Enriched target cells (e.g., fnRBC) can be “binned” prior to analysis ofthe enriched cells (FIGS. 17 and 18). Binning is any process whichresults in the reduction of complexity and/or total cell number of theenriched cell output. Binning may be performed by any method known inthe art or described herein. One method of binning the enriched cells isby serial dilution. Such dilution may be carried out using anyappropriate platform (e.g., PCR wells, microtiter plates). Other methodsinclude nanofluidic systems which separate samples into droplets (e.g.,BioTrove, Raindance, Fluidigm). Such nanofluidic systems may result inthe presence of a single cell present in a nanodroplet.

Binning may be preceded by positive selection for target cellsincluding, but not limited to affinity binding (e.g. using anti-CD71antibodies). Alternately, negative selection of non-target cells mayprecede binning. For example, output from the size-based separationmodule may be passed through a magnetic hemoglobin enrichment module(MHEM) which selectively removes WBCs from the enriched sample.

For example, the possible cellular content of output from enrichedmaternal blood which has been passed through a size-based separationmodule (with or without further enrichment by passing the enrichedsample through a MHEM) may consist of: 1) approximately 20 fnRBC; 2)1,500 mnRBC; 3) 4,000-40,000 WBC; 4) 15×10⁶ RBC. If this sample isseparated into 100 bins (PCR wells or other acceptable binningplatform), each bin would be expected to contain: 1) 80 negative binsand 20 bins positive for one fnRBC; 2) 1,500 mnRBC; 3) 400-4,000 WBC; 4)15×10⁴ RBC. If separated into 10,000 bins, each bin would be expected tocontain: 1) 9,980 negative bins and 20 bins positive for one fnRBC; 2)8,500 negative bins and 1,500 bins positive for one mnRBC; 3)<1-4 WBC;4) 15×10² RBC. One of skill in the art will recognize that the number ofbins may be increased depending on experimental design and/or theplatform used for binning. The reduced complexity of the binned cellpopulations may facilitate further genetic and cellular analysis of thetarget cells.

Analysis may be performed on individual bins to confirm the presence oftarget cells (e.g. fnRBC) in the individual bin. Such analysis mayconsist of any method known in the art, including, but not limited to,FISH, PCR, STR detection, SNP analysis, biomarker detection, andsequence analysis (FIGS. 17 and 18).

Fetal Biomarkers

In some embodiments fetal biomarkers may be used to detect and/orisolate fetal cells, after enrichment or after detection of fetalabnormality or lack thereof. For example, this may be performed bydistinguishing between fetal and maternal nRBCs based on relativeexpression of a gene (e.g., DYS1, DYZ, CD-71, ε- and ζ-globin) that isdifferentially expressed during fetal development. In preferredembodiments, biomarker genes are differentially expressed in the firstand/or second trimester. “Differentially expressed,” as applied tonucleotide sequences or polypeptide sequences in a cell or cell nuclei,refers to differences in over/under-expression of that sequence whencompared to the level of expression of the same sequence in anothersample, a control or a reference sample. In some embodiments, expressiondifferences can be temporal and/or cell-specific. For example, forcell-specific expression of biomarkers, differential expression of oneor more biomarkers in the cell(s) of interest can be higher or lowerrelative to background cell populations. Detection of such difference inexpression of the biomarker may indicate the presence of a rare cell(e.g., fnRBC) versus other cells in a mixed sample (e.g., backgroundcell populations). In other embodiments, a ratio of two or more suchbiomarkers that are differentially expressed can the measured and usedto detect rare cells.

In one embodiment, fetal biomarkers comprise differentially expressedhemoglobins. Erythroblasts (nRBCs) are very abundant in the early fetalcirculation, virtually absent in normal adult blood and by having ashort finite lifespan, there is no risk of obtaining fnRBC which maypersist from a previous pregnancy. Furthermore, unlike trophoblastcells, fetal erythroblasts are not prone to mosaic characteristics.

Yolk sac erythroblasts synthesize ε-, ζ-, γ- and α-globins, thesecombine to form the embryonic hemoglobins. Between six and eight weeks,the primary site of erythropoiesis shifts from the yolk sac to theliver, the three embryonic hemoglobins are replaced by fetal hemoglobin(HbF) as the predominant oxygen transport system, and ε- and ζ-globinproduction gives way to γ, α- and β-globin production within definitiveerythrocytes (Peschle et al., 1985). HbF remains the principalhemoglobin until birth, when the second globin switch occurs andβ-globin production accelerates.

Hemoglobin (Hb) is a heterodimer composed of two identical a globinchains and two copies of a second globin. Due to differential geneexpression during fetal development, the composition of the second chainchanges from ε globin during early embryonic development (1 to 4 weeksof gestation) to 7 globin during fetal development (6 to 8 weeks ofgestation) to β globin in neonates and adults as illustrated in (Table1).

TABLE 1 Relative expression of ε, γ and β in maternal and fetal RBCs. εγ B 1^(st) trimester Fetal ++ ++ − Maternal − +/− ++ 2^(nd) trimesterFetal − ++ +/− Maternal − +/− ++

In the late-first trimester, the earliest time that fetal cells may besampled by CVS, fnRBCs contain, in addition to a globin, primarily ε andγ globin. In the early to mid second trimester, when amniocentesis istypically performed, fnRBCs contain primarily γ globin with some adult βglobin. Maternal cells contain almost exclusively α and β globin, withtraces of γ detectable in some samples. Therefore, by measuring therelative expression of the ε, γ and β genes in RBCs purified frommaternal blood samples, the presence of fetal cells in the sample can bedetermined. Furthermore, positive controls can be utilized to assessfailure of the FISH analysis itself.

In various embodiments, fetal cells are distinguished from maternalcells based on the differential expression of hemoglobins β, γ or ε.Expression levels or RNA levels can be determined in the cytoplasm or inthe nucleus of cells. Thus in some embodiments, the methods hereininvolve determining levels of messenger RNA (mRNA), ribosomal RNA(rRNA), or nuclear RNA (nRNA).

In some embodiments, identification of fnRBCs can be achieved bymeasuring the levels of at least two hemoglobins in the cytoplasm ornucleus of a cell. In various embodiments, identification and assay isfrom 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 15 or 20 fetal nuclei. Furthermore,total nuclei arrayed on one or more slides can number from about 100,200, 300, 400, 500, 700, 800, 5000, 10,000, 100,000, 1,000,000,2,000,000 to about 3,000,000. In some embodiments, a ratio for γ/βor ε/βis used to determine the presence of fetal cells, where a number lessthan one indicates that a fnRBC(s) is not present. In some embodiments,the relative expression of γ/β or ε/βprovides a fnRBC index (“FNI”), asmeasured by γ or ε relative to β. In some embodiments, a FNI for γ/βgreater than 5, 10, 15, 20, 25, 30, 35, 40, 45, 90, 180, 360, 720, 975,1020, 1024, 1250 to about 1250, indicate that a fnRBC(s) is present. Inyet other embodiments, a FNI for γ/β of less than about 1 indicates thata fnRBC(s) is not present. Preferably, the above FNI is determined froma sample obtained during a first trimester. However, similar ratios canbe used during second trimester and third trimester.

In some embodiments, the expression levels are determined by measuringnuclear RNA transcripts including, nascent or unprocessed transcripts.In another embodiment, expression levels are determined by measuringmRNA, including ribosomal RNA. There are many methods known in the artfor imaging (e.g., measuring) nucleic acids or RNA including, but notlimited to, using expression arrays from Affymetrix, Inc. or Illumina,Inc.

RT-PCR primers can be designed by targeting the globin variable regions,selecting the amplicon size, and adjusting the primers annealingtemperature to achieve equal PCR amplification efficiency. Thus TaqManprobes can be designed for each of the amplicons with well-separatedfluorescent dyes, Alexa fluor®-355 for ε, Alexa Fluor®-488 for γ, andAlexa Fluor-555 for β. The specificity of these primers can be firstverified using ε, γ, and β cDNA as templates. The primer sets that givethe best specificity can be selected for further assay development. Asan alternative, the primers can be selected from two exons spanning anintron sequence to amplify only the mRNA to eliminate the genomic DNAcontamination.

The primers selected can be tested first in a duplex format to verifytheir specificity, limit of detection, and amplification efficiencyusing target cDNA templates. The best combinations of primers can befurther tested in a triplex format for its amplification efficiency,detection dynamic range, and limit of detection.

Various commercially available reagents are available for RT-PCR, suchas One-step RT-PCR reagents, including Qiagen One-Step RT-PCR Kit andApplied Biosytems TaqMan One-Step RT-PCR Master Mix Reagents kit. Suchreagents can be used to establish the expression ratio of ε, γ, and βfusing purified RNA from enriched samples. Forward primers can belabeled for each of the targets, using Alexa fluor-355 for ε, Alexafluor-488 for γ, and Alexa fluor-555 for β. Enriched cells can bedeposited by cytospinning onto glass slides. Additionally, cytospinningthe enriched cells can be performed after in situ RT-PCR. Thereafter,the presence of the fluorescent-labeled amplicons can be visualized byfluorescence microscopy. The reverse transcription time and PCR cyclescan be optimized to maximize the amplicon signal:background ratio tohave maximal separation of fetal over maternal signature. Preferably,signal:background ratio is greater than 5, 10, 50 or 100 and the overallcell loss during the process is less than 50, 10 or 5%.

Fetal Cell Analysis

In step 125, DNA is extracted and purified from cells/nuclei of theenriched product (mixed sample enriched) and reference sample. Methodsfor extracting DNA are known to those skilled in the art.

In step 131, the DNA is optionally pre-amplified to increase the overallquantity of DNA for subsequent analysis. Pre-amplification of DNA can beconducted using any amplification method known in the art, including forexample, amplification via multiple displacement amplification (MDA)(Gonzalez J M, et al. Cold Spring Harb Symp Quant Biol; 68:69-78 (2003),Murthy et al. Hum Mutat 26(2):145-52 (2005) and Paulland et al.,Biotechniques; 38(4):553-4, 556, 558-9 (2005)), and linear amplificationmethods such as in vitro transcription (Liu, et al., BMC Genomics;4(1)19 (2003)).

Other methods for pre-amplification include PCR methods includingquantitative PCR, quantitative fluorescent PCR (QF-PCR), multiplexfluorescent PCR (MF-PCR), real time PCR(RT-PCR), single cell PCR.PCR-RFLP/RT-PCR-RFLP, hot start PCR and Nested PCR. For example, the PCRproducts can be directly sequenced bi-directionally by dye-terminatorsequencing. PCR can be performed in a 384-well plate in a volume of 15ul containing 5 ng genomic DNA, 2 mM MgCl₂, 0.75 ul DMSO, 1 M Betaine,0.2 mM dNTPs, 20 pmol primers, 0.2 ul AmpliTaq Gold® (AppliedBiosystems), IX buffer (supplied with AmpliTaq Gold). Thermal cyclingconditions are as follows: 95° C. for 10 minutes; 95° C. for 30 seconds,60° C. for 30 seconds, 72° C. for 1 minute for 30 cycles; and 72° C. for10 minutes. PCR products can be purified with Ampure® Magnetic Beads(Agencourt) and can be optionally separated by capillary electrophoresison an ABI3730 DNA Analyzer (Applied Biosystems).

Other suitable amplification methods include the ligase chain reaction(LCR), transcription amplification, self-sustained sequence replication,selective amplification of target polynucleotide sequences, consensussequence primed polymerase chain reaction (CP-PCR), arbitrarily primedpolymerase chain reaction (AP-PCR) and nucleic acid based sequenceamplification (NABSA). Other amplification methods that may be used instep 131 include those described in, U.S. Pat. Nos. 5,242,794,5,494,810, 4,988,617 and 6,582,938, each of which is incorporated hereinby reference.

The pre-amplification step increases the amount of enriched fetal DNAthus allowing analysis to be performed even if up to 1 μg, 500 ng, 200ng 100 ng, 50 ng, 40 ng, 30 ng, 20 ng, 10 ng, 5 ng, 1 ng, 500 pg, 200pg, 100 pg, 50 pg, 40 pg, 30 pg, 20 pg, 10 pg, 5 pg, or 1 pg of fetal ortotal DNA was obtained from the mixed sample, or between 1-5 μg, 5-10μg, 10-50 μg of fetal or total DNA was obtained from the mixed sample.

In step 141, SNP(s) are detected from DNA of both mixed and referencesamples using any method known in the art. Detection can involvedetecting an abundance of a nucleotide base at a SNP position. Detectioncan be accomplished using a DNA microarray, bead microarray, or highthroughput sequencing. In some instances SNPs are detected using highlyparallel SNP detection methods such as those described in Fan J B, etal. Cold Spring Barb Symp Quant Biol; 68:69-78 (2003); Moorhead M, etal. Eur. J. Hum Genet 14:207-215 (2005); Wang Y, et. al. Nucleic AcidsRes; 33(21):e183 (2005). Highly parallel SNP detection providesinformation about genotype and gene copy numbers at a large number ofloci scattered across the genome in one procedure. In some cases, highlyparallel SNP detection involves performing SNP specificligation-extension reactions, followed by amplification of the products.The readout of the SNP types can be done using DNA microarrays(Gunderson et al. Nat. Genety 37(5):549-54 (2005), bead arrays (Shen, etal., Mutat. Res; 573 (1-2):70-82 (2005), or by sequencing, such as highthroughput sequencing (e.g. Margulies et al. Nature, 437 (7057):376-80(2005)) of individual amplicons.

In some embodiments, cDNAs, which are reverse transcribed from mRNAsobtained from fetal or maternal cells, are analyzed for the presence ofSNPS using the methods disclosed within. The type and abundance of thecDNAs can be used to determine whether a cell is a fetal cell (such asby the presence of Y chromosome specific transcripts) or whether thefetal cell has a genetic abnormality (such as anueploidy, abundance ofalternative transcripts or problems with DNA methylation or imprinting).

In one embodiment, fetal or maternal cells or nuclei are enriched usingone or more methods disclosed herein. Preferably, fetal cells areenriched by flowing the sample through an array of obstacles thatselectively directs particles or cells of different hydrodynamic sizesinto different outlets such that fetal cells and cells larger than fetalcells are directed into a first outlet and one or more cells orparticles, smaller than the rare cells are directed into a secondoutlet.

Total RNA or poly-A mRNA is then obtained from enriched cell(s) (fetalor maternal cells) using purification techniques known in the art.Generally, about 1 μg-2 μg of total RNA is sufficient. Next, afirst-strand complementary DNA (cDNA) is synthesized using reversetranscriptase and a single T7-oligo(dT) primer. Next, a second-strandcDNA is synthesized using DNA ligase, DNA polymerase, and RNase enzyme.Next, the double stranded cDNA (ds-cDNA) is purified.

In another embodiment, total RNA is extracted from, enriched cells(fetal cells or maternal cells). Next a, two one-quarter scale MessageAmp II reactions (Ambion, Austin, Tex.) are performed for each RNAextraction using 200 ng of total RNA. MessageAmp is a procedure based onantisense RNA (aRNA) amplification, and involves a series of enzymaticreactions resulting in linear amplification of exceedingly small amountsof RNA for use in array analysis. Unlike exponential RNA amplificationmethods, such as NASBA and RT-PCR, aRNA amplification maintainsrepresentation of the starting mRNA population. The procedure beginswith total or poly(A) RNA that is reverse transcribed using a primercontaining both oligo(dT) and a T7 RNA polymerase promoter sequence.After first-strand synthesis, the reaction is treated with RNase H tocleave the mRNA into small fragments. These small RNA fragments serve asprimers during a second-strand synthesis reaction that produces adouble-stranded cDNA template.

Any DNA microarray that is capable of detecting one or more SNPs can beused with the methods herein. DNA microarrays comprise a plurality ofgenetic probes immobilized at discrete sites (i.e., defined locations orassigned positions) on a substrate surface. A DNA microarray preferablymonitors at least 5, 10, 20, 50, 100, 200, 500, 1,000, 2,000, 5,000,10,000, 20,000, 50,000, 100,000, 200,000 or 500,000 different SNPs. SuchSNPs can be located in one or more target chromosomes or over the entiregenome. Methods for manufacturing DNA microarrays for detecting SNPs areknown in the art. Microarrays that can be used in the systems hereininclude those commercially available from Affymetrix (Santa Clara,Calif.), Illumina (San Diego, Calif.), Spectral Genomics, Inc. (Houston,Tex.), and Vysis Corporation (Downers Grove, III.). Methods fordetecting SNPs using microarrays are further described in U.S. Pat. Nos.6,300,063, 5,837,832, 6,969,589, 6,040,138, and 6,858,412.

In one embodiment, SNPs are detected using molecular inversion probes(MIPs). MIPs are nearly circularized probes having a first end of theprobe complementary to a region immediately upstream of the SNP to bedetected, and a second end of the probe complementary to a regionimmediately downstream of the SNP. To use MIPs both ends are allowed tohybridize to genomic regions surrounding the SNP and an enzymaticreaction fills the gap at the SNP position in an allele specific manner.The fully circular probe now can be separated by a simple exonucleasereaction which leaves a primer sequence coupled to a label unique to theallele. The primer is subsequently used to amplify the label which isthen hybridized to an array for detection.

FIG. 5 illustrates one embodiment of an allele specific extension andligation reaction. Genomic DNA fragments are first annealed to a solidsupport. Subsequently, probes designed to be unique for each allele (P1and P2) are annealed to the target DNA. After a washing step,allele-specific primer extension is conducted to extend the probes ifsuch probes have 3′ ends that are complementary to their cognate SNP inthe genomic DNA template. The extension is followed by ligation of theextended templates to their corresponding locus-specific probes (P3) tocreate PCR templates. Requiring the joining of two fragments (P1 and P3or P2 and P3) to create a PCR template provides an additional level ofgenomic specificity, because any residual incorrectly hybridizedallele-specific or locus-specific probes are unlikely to be adjacent andthus should not be able to ligate. Next, fluorescently labeled primers,each with a different dye, are added for PCR amplification, thusproviding a means for detection and quantification of each SNP byproviding data points. In addition, each SNP is assigned a differentaddress sequence (P3) which is contained within the locus-specificprobe. Each address sequence is complementary to a unique capturesequence that can be contained by one of several bead types present inan array. Furthermore, the use of universal PCR primers to associate afluorescent dye with each SNP allele provides a cost-saving element,because only three primers, two labeled and one unlabeled, are neededregardless of the number of SNPs to be assayed.

If the addresses are captured by beads, multiple SNPs can be amplifiedin the same or in different reactions using bead amplification. Whenmore than one DNA polymorphism is used in the same amplificationreaction, primers are chosen to be multiplexable (fairly uniform meltingtemperature, absence of cross-priming on the human genome, and absenceof primer-primer interaction based on sequence analysis) with otherpairs of primers. Furthermore, primers and loci may be chosen so thatthe amplicon lengths from a given locus do not overlap with those fromanother locus. Multiple dyes and multi-color fluorescence readout may beused to increase the multiplexing capacity.

In some embodiments, highly parallel SNP detection is performed byarrayed primer extension (APEX). In order to perform APEX, a gene locusis chosen where one wishes to analyze SNPs or mutations, for example,loci for abnormal ploidy disorders (e.g. chromosome X, 13, 18, and 21).Oligonucleotides (e.g., about 20-, 25-, 30-, 40-, 50-mers) are designedto be complementary to the gene up to, but not including the base wherethe mutation or SNP exists. In one example, the oligonucleotides aremodified with an amine group at the 5′ end to facilitate covalentbinding to activated glass slides, in this case epoxy silanizedsurfaces. The locus in question is PCR amplified and the DNAenzymatically sheared to facilitate hybridization to the oligos. The PCRreactions contain dTTP and dUTP at about a 5 to 1 ratio, and theincorporation of the dUTP allows the amplified DNA to be enzymaticallycut with uracil N-glycosylase (UNG). The optimal size of the sheared DNAis about 100 base pairs. The sheared DNA is then hybridized to the boundoligos and a primer extension reaction carried out using a thermostableDNA polymerase such as Thermo Sequenase (Amersham Pharmacia Biotech) orAmpliTaq FS (Roche Molecular Systems). The primer extension reactioncontains four dideoxynucleotides (ddNTPs) corresponding to A, G, C & T,with each ddNTP containing a distinct fluor molecule. In the aboveexample, ddNTPs can be conjugated to either fluorescein, Cy3™, Texas Redor Cy5™. Depending on which base is next in the sequence (wild type,mutant or SNP), the primer extension reaction will incorporate onenucleotide with one and only one of the four dyes. Thus, by applying asimple four laser scan one can tell which base is next in the sequenceas each of the above dyes are easily spectrally separable. A largenumber of different oligos, (e.g., 5-, 10-, 15-, 20-, 30-, 40-, 50-,60-, 70-, 80-, 90-, 100-thousand probes) may be attached to a slide forthis type of analysis with the requirement that very little crosshybridization occurs among all, the sequences. It may be helpful toincrease the length of the oligos (e.g., 50-, 60-, 70-, 80-mers) so thatthe initial hybridization can be done at higher stringency resulting inless background from non-homologous hybridization. In the APEX method,the signal to noise ratio is about 40 to 1, a level which is more thansufficient for unambiguously identifying SNPs and mutations. To designsuch large arrays for SNP analysis, a computational screen can beconducted to favor a subset of sequences with similar GC content andthermodynamic properties, and eliminate sequences with possiblesecondary structure or sequence similarity to other tags. Shoemaker etal. Nature Genetics 14:450-456 (1996); Giaever et al. Nature Genetics21:278-283 (1999); Winzeler et al. Science 285:901-906 (1999). Forexample, in high density tag array 64,000 probes, each probe occupyingan area of 30×30 μm, are used for parallel genotyping of human SNPs.

In some cases, it may be desirable to introduce a novel restriction sitein the region of the mutation to create cleavage-based detection.Gasparini, et al., Mol. Cell Probes 6:1 (1992). Amplification issubsequently performed using Taq ligase and the like. Barany, Proc.Natl. Acad. Sci. USA 88:189 (1991). In such cases, ligation will occuronly if there is a perfect match at the 3′-terminus of the 5′ sequence,making it possible to detect the presence of a known mutation at aspecific site by looking for the presence or absence of amplification.

Alternatively, detection of single strand conformation polymorphism(SSCP) may be used to detect differences in electrophoretic mobilitybetween mutant and wild type nucleic acids (e.g., SNP). Orita, et al.,Proc. Natl. Acad. Sci. USA: 86: 2766 (1989); Cotton, Mutat. Res. 285:125-144 (1993); and Hayashi, Genet. Anal. Tech. Appl. 9: 73-79 (1992).Single-stranded DNA fragments of sample and control nucleic acids willbe denatured and allowed to renature. The secondary structure ofsingle-stranded nucleic acids varies according to sequence, theresulting alteration in electrophoretic mobility enables the detectionof even a single base change. The DNA fragments may be labeled ordetected with labeled probes. The sensitivity of the assay may beenhanced by using RNA (rather than DNA), in which the secondarystructure is more sensitive to a change in sequence. The subject methodutilizes heteroduplex analysis to separate double-stranded heteroduplexmolecules on the basis of changes in electrophoretic mobility. Keen, etal., Trends Genet. 7: 5 (1991).

Other methods for detecting SNPs include methods in which protectionfrom cleavage agents is used to detect mismatched bases in DNA/RNA orRNA/DNA heteroduplexes. Myers, et al., Science 230: 1242 (1985). Ingeneral, the art technique of “mismatch cleavage” starts by providing,heteroduplexes of formed by hybridizing (labeled) RNA or DNA containingthe control sequence with potentially mutant RNA or DNA obtained from atissue sample. The double-stranded duplexes are treated with an agentthat cleaves single stranded regions of the duplex such as those thatexist due to “base pair mismatches” between the control and samplestrands. For instance, RNA/DNA duplexes can be treated with RNase andDNA/DNA hybrids treated with S1 nuclease to enzymatically digesting themismatched regions. Furthermore, either DNA/DNA or RNA/DNA duplexes canbe treated with hydroxylamine or osmium tetroxide and with piperidine inorder to digest mismatched regions. After digestion of the mismatchedregions, the resulting material is then separated by size on denaturingpolyacrylamide gels to determine the site of mutation. Cotton, et al.,Proc. Natl. Acad. Sci. USA 85:4397 (1988); and Saleeba, et al., MethodsEnzymol. 2 17: 286-295 (1992). The control DNA or RNA can be labeled fordetection.

SNPs can also be detected and quantified using by sequencing methodsincluding the classic Sanger sequencing method as well as highthroughput sequencing, which may be capable of generating at least1,000, 5,000, 10,000, 20,000, 30,000, 40,000, 50,000, 100,000 or 500,000sequence reads per hour, with at least 50, 60, 70, 80, 90, 100, 120 or150 bases per read.

High throughput sequencing can involve sequencing-by-synthesis,sequencing-by-ligation, and ultra deep sequencing.

Sequence-by-synthesis can be initiated using sequencing primerscomplementary to the sequencing element on the nucleic acid tags. Themethod involves detecting the identity of each nucleotide immediatelyafter (substantially real-time) or upon (real-time) the incorporation ofa labeled nucleotide or nucleotide analog into a growing strand of acomplementary nucleic acid sequence in a polymerase reaction. After thesuccessful incorporation of a label nucleotide, a signal is measured andthen nulled by methods known in the art. Examples ofsequence-by-synthesis methods are described in U.S. ApplicationPublication Nos. 2003/0044781, 2006/0024711, 2006/0024678 and2005/0100932. Examples of labels that can be used to label nucleotide ornucleotide analogs for sequencing-by-synthesis include, but are notlimited to, chromophores, fluorescent moieties, enzymes, antigens, heavymetal, magnetic probes, dyes, phosphorescent groups, radioactivematerials, chemiluminescent moieties, scattering or fluorescentnanoparticles, Raman signal generating moieties, and electrochemicaldetection moieties. Sequencing-by-synthesis can generate at least 1,000,at least 5,000, at least 10,000, at least 20,000, 30,000, at least40,000, at least 50,000, at least 100,000 or at least 500,000 reads perhour. Such reads can have at least 50, at least 60, at least 70, atleast 80, at least 90, at least 100, at least 120 or at least 150 basesper read.

Another sequencing method involves hybridizing the amplified regions toa primer complementary to the sequence element in an LST. Thishybridization complex is incubated with a polymerase, ATP sulfurylase,luciferase, apyrase, and the substrates luciferin and adenosine 5′phosphosulfate. Next, deoxynucleotide triphosphates corresponding to thebases A, C, G, and T (U) are added sequentially. Each base incorporationis accompanied by release of pyrophosphate, converted to ATP bysulfurylase, which drives synthesis of oxyluciferin and the release ofvisible light. Since pyrophosphate release is equimolar with the numberof incorporated bases, the light given off is proportional to the numberof nucleotides adding in any one step. The process is repeated until theentire sequence is determined.

Yet another sequencing method involves a four-color sequencing byligation scheme (degenerate ligation), which involves hybridizing ananchor primer to one of four positions. Then an enzymatic ligationreaction of the anchor primer to a population of degenerate nonamersthat are labeled with fluorescent dyes is performed. At any given cycle,the population of nonamers that is used is structure such that theidentity of one of its positions is correlated with the identity of thefluorophore attached to that nonamer. To the extent that the ligasediscriminates for complementarily at that queried position, thefluorescent signal allows the inference of the identity of the base.After performing the ligation and four-color imaging, the anchor primer:nonamer complexes are stripped and a new cycle begins. Methods to imagesequence information after performing ligation are known in the art.

In some cases, high throughput sequencing involves the use of ultra-deepsequencing, such as described in Marguiles et al., Nature 437 (7057):376-80 (2005). Briefly, the amplicons are diluted and mixed with beadssuch that each bead captures a single molecule of the amplifiedmaterial. The DNA molecule on each bead is then amplified to generatemillions of copies of the sequence which all remain bound to the bead.Such amplification can occur by PCR. Each bead can be placed in aseparate well, which can be a (optionally addressable) picolitre-sizedwell. In some embodiments, each bead is captured within a droplet of aPCR-reaction-mixture-in-oil-emulsion and PCR amplification occurs withineach droplet. The amplification on the bead results in each beadcarrying at least one million, at least 5 million, or at least 10million copies of the original amplicon coupled to it. Finally, thebeads are placed into a highly parallel sequencing by synthesis machinewhich generates over 400,000 reads (˜100 bp per read) in a single 4 hourrun.

Other methods for ultra-deep sequencing that can be used are describedin Hong, S, et al. Nat. Biotechnol. 22(4):435-9 (2004); Bennett, B. etal. Pharmacogenomics 6(4):373-82 (2005); Shendure, P. et al. Science 309(5741):1728-32 (2005).

The microarray or sequencing methods described herein provide a readout,which can be visualized via apparatus and methods known in the art. Forexample, for a given marker or at a given tag probe position, thefluorescence intensity of each of the fluorophores utilized (e.g.,tagged sequencing or PCR primers) provides a signal which is detected byapparatus or automated systems/machines known in the art. Thefluorophore markers can be utilized either in an array-based orsequencing-based analysis.

In step 151, SNP data is used to determine aneuploidy by, e.g.,determining the ratio of material allele(s) to paternal allele(s) (orvice versa); or determining ratio of maternal allele(s) in a regionsuspected of aneuploidy versus in a control region.

Aneuploidy means the condition of having less than or more than thenormal diploid number of chromosomes. In other words, it is anydeviation from euploidy. Aneuploidy includes conditions such as monosomy(the presence of only one chromosome of a pair in a cell's nucleus),trisomy (having three chromosomes of a particular type in a cell'snucleus), tetrasomy (having four chromosomes of a particular type in acell's nucleus), pentasomy (having five chromosomes of a particular typein a cell's nucleus), triploidy (having three of every chromosome in acell's nucleus), and tetraploidy (having four of every chromosome in acell's nucleus). Birth of a live triploid is extraordinarily rare andsuch individuals are quite abnormal, however triploidy occurs in about2-3% of all human pregnancies and appears to be a factor in about 15% ofall miscarriages. Tetraploidy occurs in approximately 8% of allmiscarriages. (http://www.emedicine.com/med/topic3241.htm).

In one embodiment, kits are provided which include a separation device,optionally a capture device and the reagents and devices used for theanalysis of the genomic sequences. For example, the kit may include theseparation arrays and DNA microarrays for detecting one or more SNPs.Any of the devices mentioned for the DNA determination may be combinedwith the separation devices. The combination of the array separationdevices with DNA analysis devices provides gentle handling and accurateanalysis.

A simple intuitive understanding of the effect of trisomy is that itincrease the abundances of fetal alleles at loci within the affectedregion. Trisomies are predominately from maternal non-dysjunctionevents, so typically both maternal alleles, and a single paternalallele, are increased, and the ratio of maternal allele abundance topaternal allele abundance is higher in the trisomic region. Thesesignatures may be masked by differences in DNA amplification andhybridization efficiency from locus to locus, and from allele to allele.

In one embodiment, trisomies are determined by comparing abundance (e.g.intensities) of maternal and paternal alleles in a genomic region.Within a locus, the PCR differences are smaller than between loci,because the same primers are responsible for all the different alleleamplicons at that locus. Therefore, the allele ratios may be more stablethan the overall allele abundances. This can be exploited by identifyingloci where the paternal allele is distinct form the maternal allele andtaking the ratio of the paternal allele strength to the average of thematernal allele strengths. These allele ratios then can be averaged overthe hypothesized aneuploidy region and compared to the average over acontrol region. The distributions of these ratio values in thehypothesized aneuploidy region and in the control region can be comparedto create an estimate of statistical significance for the observeddifference in means. A simple example of this procedure would useStudent's t-test.

Thus, the present invention contemplates detection of fetal abnormalityby determining a ratio of abundance of maternal allele(s) and abundanceof paternal allele(s) (or vice versa) in one or more genomic regions ofinterest. (Preferably the paternal allele differs from one or both thematernal alleles). The genomic region can be derived from a mixed samplecomprising fetal and maternal cells. The sample can be obtained from apregnant animal and can be, e.g., a blood sample. In some cases, thegenomic region includes a SNP and/or an informative SNP. In some casesat least 10, 20, 50, 100, 200 or 500 SNPs are analyzed per sample. TheSNPs analyzed can be in a single locus, different loci, singlechromosome, or different chromosomes. In some cases, a first genomicregion (SNP) analyzed is in a genomic region suspected of being trisomicor is trisomic and a second genomic region (SNP) analyzed is in acontrol region that is non-trisomic or a region suspected of beingnon-trisomic. The ratio of alleles (e.g., maternal/paternal) in a firstgenomic region or first plurality of genomic regions (trisomic)(hereinafter test regions) is then compared with a ratio of alleles(e.g., maternal/paternal) in the second genomic region or secondplurality of genomic regions (hereinafter control regions). The controlregion(s) and test region(s) can be on the same or differentchromosomes. In some instances, comparison is made by determining thedifference in means of the ratios in the first regions and secondregions. Detection of an increase of paternal abundance in the testregion(s) is indicative of paternal trisomy, while detection of anincrease of maternal abundance in the first region(s) is indicative ofmaternal trisomy. Furthermore, calculation of error rate based onamplification can be performed prior to making a call if a fetus has aspecific condition (e.g., trisomy) or not.

Alternatively, the maternal allele strengths over the suspectedaneuploidy region(s) can be compared to those in the control region(s),all without forming ratios to paternal alleles. In this approach, errorsin the measurement of the paternal allele abundances are not calculated.However, differences in amplification efficiency between primer pairsare calculated. These measurements can be larger than differencesbetween alleles in the same locus. In this approach there also may be aresidual bias between the efficiencies averaged over certainchromosomes. Therefore it may be useful to also perform the samedetection process in a reference sample (e.g. maternal only cell sample)and then take the ratio of ratios. In other words, the ratio obtainedfor the mixed sample of the abundance in test genomic region(s) andcontrol genomic region(s) divided by the same ratio obtained from thereference sample. The ratios obtained for the mixed and referencesamples reflect allele strength over suspected aneuploidy region overallele strength over control region, and the ratio of ratios presents anestimate that is normalized to the reference (maternal) sample. Suchratio of ratios is therefore free of chromosome bias, but may includeerrors in the measurements of the reference sample, as that sample isused as the control or normalizer.

In some cases, the methods herein contemplate detecting fetalabnormality by comparing an abundance of one or more maternal allele(s)in a first genomic region or regions (test region(s)) with one or morematernal alleles in a second genomic region or regions (controlregion(s)) in a mixed sample (e.g., maternal blood sample from pregnantanimal). This ratio can then be compared to a similar ratio measured ina control sample (e.g., maternal-cell only sample). The control samplecan be a diluted subset of the mixed sample, wherein the dilution is bya factor of at least 10, 100, 1000, or 10,000. In some cases, suchmethods further involve estimating the number of fetal cells in themixed sample. This can be performed by, e.g., ranking the allelesdetected according to their abundance. The ranking can be used todetermine abundance of one or more paternal alleles. Ranking isdescribed in more detail herein.

Aneuploidy can be determined by modeling SNP data. One example of amodel for SNP data in the context of fetal diagnosis is given inEquations 1-3 below.

A normal (diploid) fetus result in data x_(k) at locus k and isrepresented by:

x _(k) =A _(k)[(1−f)(m _(k1) +m _(k2))+f((m _(k1) or m _(k2))+p_(k))]+residual  (1)

A trisomy caused by maternal non-dysjunction is represented by

x _(k) =A _(k)[(1−f)(m _(k1) +m _(k2))+f(m _(k1) +m _(k2) +p_(k))]+residual  (2)

and a paternally inherited trisomy is represented by

x _(k) =A _(k)[(1−f)(m _(k1) +m _(k2))+f((m _(k1) or m _(k2))+p _(k1) +p_(k2))]+residual  (3)

In Equations 1-3, A_(k) denotes a scale factor which subsumes theefficiencies of amplification, hybridization, and readout common to thealleles at locus k. In this model amplification differences betweendifferent primer pairs are fitted and do not appear in the residuals.Alternatively, a single A parameter could be used and the residualswould reflect these differences. Further, f represents the fraction offetal cells in the mixture, m_(k1) and m_(k2) denote the maternalalleles at locus k, and p_(k) denotes the paternal allele at locus k.The allele symbols actually represent unit data contributions that canbe arithmetically summed; ‘e.g., m_(k1) might be a detection of the ‘C’genotype represented by unit contribution to the ‘C’ bin at that locus.

FIG. 6 illustrates the SNP calls that result under this data model. AtLocus 1, the fetal genotype was GC. There is a paternally inherited ‘G’allele contribution in the mixed sample that results in an increase of Gsignal above the noise level observed in the maternal-only sample, and amaternally inherited ‘C’ allele contribution that increases the Csignal. The effective value that has been assumed in these illustrationsis f=0.2. At Locus 2, the paternal allele is ‘T’. At Locus 3, the fetusis homozygous GG. In the third row of FIG. 2, the effect of a fetaltrisomy is represented by the dashed red lines, superposed on a normal(diploid) mixed-sample pattern. The trisomy is assumed to include Loci 1and 2, but not Loci 3 and 4. At Loci 1 and 2 both maternal allelestrengths are increased in the mixed sample, as well as the separatepaternal allele contribution. At Locus 3, it was assumed that the fetuswas ‘GG’ and the paternal allele is the same as the first maternalallele. Note that the ratio between the average of the two maternalalleles and the paternal allele will be slightly greater at Loci 1 and 2than at Locus 4—this is one indicator of trisomy.

The location and abundance of SNPs can be used to determine whether thefetus has an abnormal genotypes, such as Down syndrome or KleinfelterSyndrome (XXY). Other examples of abnormal fetal genotypes include, butare not limited to, aneuploidy such as, monosomy of one or morechromosomes (X chromosome monosomy, also known as Turner's syndrome),trisomy of one or more chromosomes (13, 18, 21, and X), tetrasomy andpentasomy of one or more chromosomes (which in humans is most commonlyobserved in the sex chromosomes, e.g. XXXX, XXYY, XXXY, XYYY, XXXXX,XXXXY, XXXYY, XYYYY and XXYYY), triploidy (three of every chromosome,e.g. 69 chromosomes in humans), tetraploidy (four of every chromosome,e.g. 92 chromosomes in humans) and multiploidy. In some embodiments, anabnormal fetal genotype is a segmental aneuploidy. Examples of segmentalaneuploidy include, but are not limited to, 1p36 duplication, dup(17)(p11.2p11.2) syndrome, Down syndrome, Pelizaeus-Merzbacher disease,dup (22)(q11.2q11.2) syndrome, and cat-eye syndrome. In some cases, anabnormal fetal genotype is due to one or more deletions of sex orautosomal chromosomes, which may result in a condition such asCri-du-chat syndrome, Wolf-Hirschhorn, Williams-Beuren syndrome,Charcot-Marie-Tooth disease, Hereditary neuropathy with liability topressure palsies, Smith-Magenis syndrome, Neurofibromatosis, Alagillesyndrome, Velocardiofacial syndrome, DiGeorge syndrome, Steroidsulfatase deficiency, Kallmann syndrome, Microphthalmia with linear skindefects, Adrenal hypoplasia, Glycerol kinase deficiency,Pelizaeus-Merzbacher disease, Testis-determining factor on Y, Azospermia(factor a), Azospermia (factor b), Azospermia (factor c), or 1p36deletion. In some embodiments, a decrease in chromosomal number resultsin an XO syndrome.

In some cases, data models are fitted for optimal detection ofaneuploidy. For example, the data models can be used to simultaneouslyrecover estimates of the fraction of fetal cells, and efficientdetection of aneuploidy in hypothesized chromosomes or chromosomalsegments. This integrated approach results in more reliable andsensitive declarations of aneuploidy.

Equations 1-3 represent five different models because of the ambiguitybetween m_(k)1 and m_(k)2 in the last term of Equations 1 and 3. Inother words, since Equation 1 and 3 are different and in each equationthere are two possibilities (i.e., m_(k1) or m_(k2)) then it followsthat each of Equations 1 and 3 represent two different models.Therefore, Equations 1-3 represent five different models. Testing foraneuploidy of Chromosomes 13, 18, and 21, for example entails 5×5×5=125different model variants that would be fit to the data.

The parameter values for the maternal allele identities are taken fromthe results for the reference (i.e. maternal-only) sample and theremaining parameters are fit to the data from the mixed sample. Becausethe number of parameters is very large when the number of loci is large,a global optimization requires iterative search techniques. One possibleapproach is to do the following for each model variant

i) Set f to 0 and solve for A_(k) at each locus.ii) Set f to a value equal to the smallest fetal/maternal cell ratio forwhich fetal cells are likely to be detectable.iii) Solve for paternal allele(s) identities and strengths at eachlocus, one locus at a time, that minimize data-model residuals.iv) Fix the paternal alleles and adjust f to minimize residuals over allthe data.v) Now vary only the A_(k) to minimize residuals. Repeat iv and v untilconvergence.vi) Repeat iii through v until convergence.

The best overall fit of model to data is selected from among all themodel variants. The best overall fit yields the values of f and A_(k) wewill call f_(max), A_(kmax). The likelihood of observing the data givenf_(max) can be compared to the likelihood given f=0. The ratio is ameasure of the amount of evidence for fetal DNA. A typical threshold fordeclaring fetal DNA would be a likelihood ratio of ˜1000 or more. Thelikelihood calculation can be approximated by a more familiarChi-squared calculation involving the sum of squared residuals betweenthe data and the model, where each residual is normalized by theexpected rms error. This Chi-squared is a good approximation to theLog(likelihood) to the extent the expected errors in the data areGaussian additive errors, or can be made so by some amplitudetransformation of the data.

If based on the above determination of likelihood ratio it is decidedthat fetal DNA is not present, then the test is declared to benon-informative. If it is decided that fetal DNA is present, then thelikelihoods of the data given the different data model types can becompared to declare aneuploidy. The likelihood ratios of aneuploidymodels (Equations 2 and 3) to the normal model (Equation 1) arecalculated and these ratios are compared to a predefined threshold.Typically this threshold is set so that in controlled tests all thetrisomic cases are declared aneuploid. Thus, it is expected that thevast majority (>99.9%) of all truly trisomic cases are declaredaneuploid by the test. Another approach to accomplish approximately the99.9% detection rate is to increase the likelihood ratio thresholdbeyond that necessary to declare all the known trisomic cases in thevalidation set by a factor of 1000/N, where N is the number of trisomycases in the validation set.

In step 161 (FIG. 1), which is optional, the presence of fetal cells andratio of fetal alleles/maternal alleles is determined. Because thefraction of fetal cells can be small or even zero, the aneuploidy signal(the departure of the observed ratio from unity) may be weak even whenfetal aneuploidy is present. An independent estimate of fetal cellfraction, including a confidence estimate of whether measurable fetalDNA is present at all, is useful in interpreting the observed aneuploidyratios. FIG. 7 illustrates allele signals re-ordered by rank. Assumingthe mother has no more than two alleles at each locus, the magnitude ofthe third ranked allele is potentially a robust indicator of thepresence of fetal DNA. Although measurement errors can artificiallyinflate the size of the third and fourth alleles, it is very unlikely toresult in a bimodal distribution for the relative magnitude of the thirdallele with respect to the first two. Such a bimodal distribution isillustrated in FIG. 8. The secondary peak of this distribution occurs ata value approximately equal to the fraction of fetal cells. (This is oneway to determine the value of the variable fin the data model.) Thestatistical confidence that the bimodality is real can be used to assigna confidence that fetal DNA was present in the mixed sample. Statisticaltests for bimodality are discussed in M. Y. Cheng and P. Ball, J. R.Statist. Soc. B (1998), 60 (Part 3) pp. 579-589. If this confidencelevel exceeds a threshold, e.g., 90%, 95%, 99% or 99.9%, an aneuploidycall may be made. The threshold set can be stringent (e.g. 99.9%) toavoid declaring a fetus normal when in fact it is not. Thus, theindependently estimated fetal cell fraction can be used to interpret theaneuploidy statistic. For example, a value f=0.5 along with an estimatedaneuploidy ratio from the fetal-maternal mixture of 0.05±0.01 wouldweaken the evidence for aneuploidy because the ratio is too small to beconsistent with the independently determined f value (the ratio shouldbe˜l+f/2). As another example, a value f=0.1 along with an estimatedaneuploidy ratio from the fetal-maternal mixture of 0.05±0.02 would tendto strengthen the evidence for aneuploidy because the observed ratio isconsistent with the independently derived value of f.

In any of the embodiments herein SNP data may be analyzed for possibleerrors. For example, in some instances SNP data can contain smalladditive errors associated with the readout technology, multiplicativeerrors associated with DNA amplification and hybridization efficienciesbeing different from locus to locus and from allele to allele within alocus, and errors associated with imperfect specificity in the process.By including the many parameters (Ak) in the model, rather than a singlescale parameter, the residuals include allele-to-allele efficiencydifferences but not locus to locus differences. These tend to bemultiplicative errors in the resulting observed allele strengths heights(e.g., two signals may be 20% different in strength although thestarting concentrations of the alleles are identical). In other words,by providing many parameters, the errors that are otherwise attributableto locus to locus differences, are minimized. As a first approximation,one can assume errors are random from allele to allele; errors haverelatively small additive measurement noise error components; and largerPoisson and multiplicative error components exist. The magnitudes ofthese error components can be estimated from repeated processing ofidentical samples. A Chi-square residuals calculation for any data-modelfit then can be supported with these modeled squared errors for any peakheight or data bin.

For example, we anticipate a large scale SNP genotyping platform such asthe Golden Gate assay by Ilumina will provide ˜100 SNP loci perchromosome of interest. Measurements of repeated ‘normal’ pregnancysamples would give ratios of paternal to maternal allele strengths whichvaried by ˜20% due to assay errors. Averaged over The 100 loci in achromosome, the ratio error would be reduced to 20%/sqrt(100), or ˜2%.For an assumed fetal/maternal cell ratio of 0.2 in a sample, theexpected observed aneuploidy ratio in the case of a trisomy would be1.10 with an estimation error of 0.02, yielding a confident (5 sigma)detection of aneuploidy.

Alternatively, when using a single A parameter, the residuals will belarger and will contain a component which is correlated between allelesat the same locus. Calculation of likelihood will need to take thiscorrelation into account.

Another aspect of the invention involves a computer executable logic fordetermining the presence of fetal cells in a mixed sample and fetalabnormalities and/or conditions in such cells. A computer programproduct is described comprising a computer usable medium having thecomputer executable logic (computer software program, including programcode) stored therein. Computer executable logic when executed by theprocessor causes the processor to perform one or more functionsdescribed herein. For example, a computer executable logic can beutilized to automate, process or control sample collection, sampleenrichment, pre-amplification, SNP data modeling, estimatingfetal/maternal allele ratio, comparing maternal allele intensity fromsuspected aneuploid region and control region and determining theexistence of aneuploidy and the type of aneuploidy if one exists.

For example, the computer executable logic can determine the presenceand ratio of fetal cells to maternal cells in a mixed sample. Theexecutable code can also receive data for one or more SNPs, and applysuch data to one or more data models. The computer executable logic canthen calculate a set of values for each of the data sets associated witheach data model; select the data model that best fit the data, model andcalculate for any potential errors in the data models; for example, acomputer executable logic can determine the ratio of maternal alleles topaternal alleles in one or more SNP locations; and/or the ratio ofmaternal alleles in a region suspected of aneuploidy and a controlregion. One example of a data model provides a determination of a fetalabnormality from given data signals of SNPs at two genomic regions. Theexecutable logic can establish the presence or absence of trisomy, andconclude whether the trisomy is paternally derived or if it originatedfrom a maternal non-disjunction event. For example, the program can fitSNP data to the following model, which can provide the diagnosis asfollows:

A normal (diploid) fetus result in data x_(k) at locus k and isrepresented by:

x _(k) =A _(k)[(1−f)(m _(k1) +m _(k2))+f((m _(k1) or m _(k2))+p_(k))]+residual  (1)

A trisomy caused by maternal non-dysjunction is represented by

x _(k) =A _(k)[(1−f)(m _(k1) +m _(k2))+f(m _(k1) +m _(k2) +p_(k))]+residual  (2)

and a paternally inherited trisomy is represented by

x _(k) =A _(k)[(1−f)(m _(k1) +m _(k2))+f((m _(k1) or m _(k2))+p _(k1) +p_(k2))]+residual  (3)

In Equations 1-3, A_(k) denotes a scale factor which subsumes theefficiencies of amplification, hybridization, and readout common to thealleles at locus k. In this model amplification differences betweendifferent primer pairs are fitted and do not appear in the residuals.Alternatively, a single A parameter could be used and the residualswould reflect these differences. Further, f represents the fraction offetal cells in the mixture, m_(k1) and m_(k2) denote the maternalalleles at locus k, and p_(k) denotes the paternal allele at locus k.The allele symbols actually represent unit data contributions that canbe arithmetically summed; e.g., m_(k1) might be a detection of the ‘C’genotype represented by unit contribution to the ‘C’ bin at that locus.

In some cases, the computer executable logic records data measurementscorresponding to readouts (e.g., SNP intensities from DNA microamap or asequencing machine. Such measurements can be processed by the computerexecutable logic to determine fetal/maternal allele ratios and provide acall with result with respect to detection of aneuploidy. Moreover,computer executable logic can control display of such results in printor electronic formats, which an operator can view. Thus, a computerexecutable logic can include code for receiving data on one or moretarget DNA polymorphisms (i.e. SNP loci); calculating a set of valuesfor each of the data sets associated with each data model; selecting thedata model that best fit the data, wherein the best model will be anindication of the presence of fetal cells in the mixed sample and fetalabnormalities and/or conditions in said cells. The determination ofpresence of fetal cells in the mixed sample and fetal abnormalitiesand/or conditions in said cells can be made by the computer executablelogic or an user. Therefore, the computer based logic can provideresults for estimating fetal/maternal ratios, allele strength andaneuploidy, which can be observed by a technician or operator.

EXAMPLES Example 1 Separation of Fetal Cord Blood

FIG. 12A-D shows a schematic of the device used to separate nucleatedcells from fetal cord blood.

Dimensions: 100 mm×28 mm×1 mm

Array design: 3 stages, gap size=18, 12 and 8 μm for the first, secondand third stage, respectively.

Device fabrication: The arrays and channels were fabricated in siliconusing standard photolithography and deep silicon reactive etchingtechniques. The etch depth is 140 μm. Through holes for fluid access aremade using KOH wet etching. The silicon substrate was sealed on theetched face to form enclosed fluidic channels using a blood compatiblepressure sensitive adhesive (9795, 3M, St Paul, Minn.).

Device packaging: The device was mechanically mated to a plasticmanifold with external fluidic reservoirs to deliver blood and buffer tothe device and extract the generated fractions.

Device operation: An external pressure source was used to apply apressure of 2.0 PSI to the buffer and blood reservoirs to modulatefluidic delivery and extraction from the packaged device.

Experimental conditions: Human fetal cord blood was drawn into phosphatebuffered saline containing Acid Citrate Dextrose anticoagulants. 1 mL ofblood was processed at 3 mL/hr using the device described above at roomtemperature and within 48 hrs of draw. Nucleated cells from the bloodwere separated from enucleated cells (red blood cells and platelets),and plasma delivered into a buffer stream of calcium and magnesium-freeDulbecco's Phosphate Buffered Saline (14190-144, Invitrogen, Carlsbad,Calif.) containing 1% Bovine Serum Albumin (BSA) (A8412-100 ML,Sigma-Aldrich, St Louis, Mo.) and 2 mM EDTA (15575-020, Invitrogen,Carlsbad, Calif.).

Measurement techniques: Cell smears of the product and waste fractions(FIG. 8A-8B) were prepared and stained with modified Wright-Giemsa(WG16, Sigma Aldrich, St. Louis, Mo.).

Performance: Fetal nucleated red blood cells were observed in theproduct fraction (FIG. 8A) and absent from the waste fraction (FIG. 8B).

Example 2 Isolation of Fetal Cells from Maternal blood

The device and process described in detail in Example 1 were used incombination with immunomagnetic affinity enrichment techniques todemonstrate the feasibility of isolating fetal cells from maternalblood.

Experimental conditions: blood from consenting maternal donors carryingmale fetuses was collected into K₂EDTA vacutainers (366643, BectonDickinson, Franklin Lakes, N.J.) immediately following electivetermination of pregnancy. The undiluted blood was processed using thedevice described in Example 1 at room temperature and within 9 hrs ofdraw. Nucleated cells from the blood were separated from enucleatedcells (red blood cells and platelets), and plasma delivered into abuffer stream of calcium and magnesium-free Dulbecco's PhosphateBuffered Saline (14190-144, Invitrogen, Carlsbad, Calif.) containing 1%Bovine Serum Albumin (BSA) (A8412-100 ML, Sigma-Aldrich, St Louis, Mo.).Subsequently, the nucleated cell fraction was labeled with anti-CD71microbeads (130-046-201, Miltenyi Biotech Inc., Auburn, Calif.) andenriched using the MiniMACS™ MS column (130-042-201, Miltenyi BiotechInc., Auburn, Calif.) according to the manufacturer's specifications.Finally, the CD71-positive fraction was spotted onto glass slides.

Measurement techniques: Spotted slides were stained using fluorescencein situ hybridization (FISH) techniques according to manufacturer'sspecifications using Vysis probes (Abbott Laboratories, Downer's Grove,Ill.). Samples were stained from the presence of X and Y chromosomes. Inone case, a sample prepared from a known Trisomy 21 pregnancy was alsostained for chromosome 21.

Performance: Isolation of fetal cells was confirmed by the reliablepresence of male cells in the CD71-positive population prepared from thenucleated cell fractions (FIGS. 10A-10F). In the single abnormal casetested, the trisomy 21 pathology was also identified (FIG. 11).

Example 3 Confirmation of the Presence of Male Fetal Cells in EnrichedSamples

Confirmation of the presence of a male fetal cell in an enriched sampleis performed using qPCR with primers specific for DYZ, a marker repeatedin high copy number on the Y chromosome. After enrichment of fnRBC byany of the methods described herein, the resulting enriched fnRBC arebinned by dividing the sample into 100 PCR wells. Prior to binning,enriched samples may be screened by FISH to determine the presence ofany fnRBC containing an aneuploidy of interest. Because of the lownumber of fnRBC in maternal blood, only a portion of the wells willcontain a single fnRBC (the other wells are expected to be negative forfnRBC). The cells are fixed in 2% Paraformaldehyde and stored at 4° C.Cells in each bin are pelleted and resuspended in 5 μl PBS plus 1 μl 20mg/ml Proteinase K (Sigma #P-2308). Cells are lysed by incubation at 65°C. for 60 minutes followed by inactivation of the Proteinase K byincubation for 15 minutes at 95° C. For each reaction, primer sets (DYZforward primer TCGAGTGCATTCCATTCCG; DYZ reverse primerATGGAATGGCATCAAACGGAA; and DYZ Taqman Probe6FAM-TGGCTGTCCATTCCA-MGBNFQ), TaqMan Universal PCR master mix, NoAmpErase and water are added. The samples are run and analysis isperformed on an ABI 7300:2 minutes at 50° C., 10 minutes 95° C. followedby 40 cycles of 95° C. (15 seconds) and 60° C. (1 minute). Followingconfirmation of the presence of male fetal cells, further analysis ofbins containing fnRBC is performed. Positive bins may be pooled prior tofurther analysis.

FIG. 13 shows the results expected from such an experiment. The data inFIG. 13 was collected by the following protocol. Nucleated red bloodcells were enriched from cord cell blood of a male fetus by sucrosegradient two Heme Extractions (HE). The cells were fixed in 2%paraformaldehyde and stored at 4° C. Approximately 10×1000 cells werepelleted and resuspended each in 5 μl PBS plus 1 μl 20 mg/ml ProteinaseK (Sigma #P-2308). Cells were lysed by incubation at 65° C. for 60minutes followed by a inactivation of the Proteinase K by 15 minute at95° C. Cells were combined and serially diluted 10-fold in PBS for 100,10 and 1 cell per 6 μl final concentration were obtained. Six μl of eachdilution was assayed in quadruplicate in 96 well format. For eachreaction, primer sets (DYZ forward primer TCGAGTGCATTCCATTCCG; 0.9 uMDYZ reverse primer ATGGAATGGCATCAAACGGAA; and 0.5 uM DYZ TaqMan Probe6FAM-TGGCTGTCCATTCCA-MGBNFQ), TaqMan Universal PCR master mix, NoAmpErase and water were added to a final volume of 25 μl per reaction.Plates were run and analyzed on an ABI 7300:2 minutes at 50° C., 10minutes 95° C. followed by 40 cycles of 95° C. (15 seconds) and 60° C.(1 minute). These results show that detection of a single fnRBC in a binis possible using this method.

Example 4 Confirmation of the Presence of Fetal Cells in EnrichedSamples by STR Analysis

Maternal blood is processed through a size-based separation module, withor without subsequent MHEM enhancement of fnRBCs. The enhanced sample isthen subjected to FISH analysis using probes specific to the aneuploidyof interest (e.g., triploidy 13, triploidy 18, and XYY). Individualpositive cells are isolated by “plucking” individual positive cells fromthe enhanced sample using standard micromanipulation techniques. Using anested PCR protocol, STR marker sets are amplified and analyzed toconfirm that the FISH-positive aneuploid cell(s) are of fetal origin.For this analysis, comparison to the maternal genotype is typical. Anexample of a potential resulting data set is shown in Table 2.Non-maternal alleles may be proven to be paternal alleles by paternalgenotyping or genotyping of known fetal tissue samples. As can be seen,the presence of paternal alleles in the resulting cells, demonstratesthat the cell is of fetal origin (cells #1, 2, 9, and 10). Positivecells may be pooled for further analysis to diagnose aneuploidy of thefetus, or may be further analyzed individually.

TABLE 2 STR locus alleles in maternal and fetal cells STR STR STR STRSTR locus locus locus locus locus DNA Source D14S D16S D8S F13B vWAMaternal alleles 14, 17 11, 12 12, 14 9, 9 16, 17 Cell #1 alleles 8 19Cell #2 alleles 17 15 Cell #3 alleles 14 Cell #4 alleles Cell #5 alleles17 12 9 Cell #6 alleles Cell #7 alleles 19 Cell #8 alleles Cell #9alleles 17 14 7, 9 17, 19 Cell #10 alleles 15

Example 5 Confirmation of the Presence of Fetal Cells in EnrichedSamples by SNP Analysis

Maternal blood is processed through a size-based separation module, withor without subsequent MHEM enhancement of fnRBCs. The enhanced sample isthen subjected to FISH analysis using probes specific to the aneuploidyof interest (e.g., triploidy 13, triploidy 18, and XYY). Samples testingpositive with FISH analysis are then binned into 96 microtiter wells,each well containing 15 μl of the enhanced sample. Of the 96 wells, 5-10are expected to contain a single fnRBC and each well should containapproximately 1000 nucleated maternal cells (both WBC and mnRBC). Cellsare pelleted and resuspended in 5 μl PBS plus 1 μl 20 mg/ml Proteinase K(Sigma #P-2308). Cells are lysed by incubation at 65° C. for 60 minutesfollowed by a inactivation of the Proteinase K by 15 minute at 95° C.

In this example, the maternal genotype (BB) and fetal genotype (AB) fora particular set of SNPs is known. The genotypes A and B encompass allthree SNPs and differ from each other at all three SNPs. The followingsequence from chromosome 7 contains these three SNPs (rs7795605,rs7795611 and rs7795233 indicated in brackets, respectively)(ATGCAGCAAGGCACAGACTAA[G/A]CAAGGAGA[G/C]GCAAAATTTTC[A/G]TAGGGGAGAGAAATGGGTCATT).

In the first round of PCR, genomic DNA from binned enriched cells isamplified using primers specific to the outer portion of thefetal-specific allele A and which flank the interior SNP (forward primerATGCAGCAAGGCACAGACTACG; reverse primer AGAGGGGAGAGAAATGGGTCATT). In thesecond round of PCR, amplification using real time SYBR Green PCR isperformed with primers specific to the inner portion of allele A andwhich encompass the interior SNP (forward primerCAAGGCACAGACTAAGCAAGGAGAG; reverse primerGGCAAAATTTTCATAGGGGAGAGAAATGGGTCATT).

Expected results are shown in FIG. 14. Here, six of the 96 wells testpositive for allele A, confirming the presence of cells of fetal origin,because the maternal genotype (BB) is known and cannot be positive forallele A. DNA from positive wells may be pooled for further analysis oranalyzed individually.

Example 6 Use of Highly Parallel Genotyping and High ThroughputSequencing for Fetal Diagnosis

Fetal cells or nuclei can be isolated as described in the enrichmentsection or as described in example 1. The enrichment process describedin example 1 may generate a final mixture containing approximately 500maternal white blood cells (WBCs), approximately 100 maternal nuclearred blood cells (mnBCs), and a minimum of approximately 10 fetalnucleated red blood cells (fnRBCs) starting from an initial 20 ml bloodsample taken late in the first trimester. In the context of fetaldiagnosis, it is very valuable to have a reference sample containingonly the mother's genotype. When the diagnosis procedure is based onenriching for circulating fetal cells in the mother's blood, thereference sample can be created simply by not enriching for fetal cells,and then diluting enough to ensure that <<1 fetal cell is expected inthe sample used as input to the SNP detection process. Alternatively,white blood cells can be selected, for which the circulating fetalfraction is negligible.

Perform Multiple Displacement Amplification (MDA): Current technologiesand protocols for highly parallel SNP detection with DNA microarrayreadout result in inaccurate calls when there are too few starting DNAcopies or when a particular allele represents a small fraction in thepopulation of input DNA molecules. In the methods described herein aratio-preserving pre-amplification of the DNA, such as multipledisplacement amplification, is done to provide enough copies to supportaccurate SNP detection via primer extension ligation methods describedbelow. This pre-amplification method is chosen to produce as close aspossible the same amplification factor for all target regions of thegenome.

Multiple displacement amplification protocols can be performed asdescribed in Gonzalez et al. Environmental Microbiology 7(7) 1024-1028,(2005). Briefly, samples are suspended in 100 ul 10 mM Tris-HCl buffer(pH 7.5). Cells are lysed by adding 100 ul of alkaline lysis solution(400 mM KOH, 100 mM DTT, 10 mM EDTA) and incubating cells for 10 min onice. Lysed cells are neutralized with 100 μl of neutralization solution(2 ml 1 M HCl and 3 ml 1 M Tris-HCl). Lysed cells are used directly astemplate in MDA and PCR reactions.

1 μl template DNA in 9 μl sample buffer (50 mM Tris-HCl, (pH 8.2), 0.5mM EDTA) containing random hexamers is denatured at 95° C. for 3 min andplaced on ice. Buffer (9 μl) containing dNTPs and 1 μl enzyme mixcontaining Φ29 DNA polymerase are added to the 10 μl of denatured DNAtemplate-random hexamers solution and incubated at 30° C. for 6 h. Afinal incubation at 65° C. for 10 min inactivated the Φ29 DNApolymerase.

Highly Parallel Genotyping: Highly parallel SNP detection can be used toobtain information about genotype and gene copy numbers at a largenumber of loci scattered across the genome, in one procedure. Highlyparallel SNP genotyping can be performed as described in Fan et al. ColdSpring Harb Symp Quant Biol; 68: 69-78, (2003). Genomic DNA isimmobilized to streptavidin-coated magnetic beads by mixing 20 μl of DNA(100 ng/μl) with 5 μl of photobiotin (0.2 μg/p1) and 15 μl of mineraloil, and incubating at 95° C. for 30 minutes. Trizma base (25 μl of 0.1M) is added, followed by two extractions with 75 μl of Sec-butanol toremove unreacted photobiotin. The extracted gDNA (20 μl) is then mixedwith 34 μl of Paramagnetic Particle A Reagent (MPA; Illumina) andincubated at room temperature for 90 minutes. The immobilized gDNA isthen washed twice with DNA wash buffer (WDI) (Illumina) and resuspendedat 10 ng/pl in WDI. In each subsequent reaction, 200 ng (10 μl) of DNAis used.

Assay oligonucleotides are then annealed to the genomic DNA by combiningthe immobilized DNA (10 μl) with annealing reagent (MAI; Illumina; 30μl) and SNP-specific oligonucleotides (10 μl containing 25 nM of eacholigonucleotide) to a final volume of 50 μl. LSOs are synthesized with a5′ phosphate to enable ligation. Annealing is carried out by rampingtemperature from 70° C. to 30° C. over ˜8 hours, then holding at 30° C.until the next processing step.

After annealing, excess and mishybridized oligonucleotides are washedaway, and 37 μl of master mix for extension (MME; Illumina) is added tothe beads. Extension is carried out at room temperature for 15 minutes.After washing, 37 μl of master mix for ligation (MML; Illumina) is addedto the extension products, and incubated for 20 minutes at 57° C. toallow the extended upstream oligo to ligate to the downstream oligo.

The extension products are then amplified by PCR. After extension andligation, the beads are washed with universal buffer 1 (UB 1; Illumina),resuspended in 35 μl of elution buffer ((IP1; Illumina) and heated at95° C. for one minute to release the ligated products. The supernatantis then used in a 60-μl PCR. PCR reactions are thermocycled as follows:10 seconds at 25° C.; 34 cycles of (35 seconds at 95° C., 35 seconds at56° C., 2 minutes at 72OC); 10 minutes at 72° C.; and cooled to 4° C.for 5 minutes. The three universal PCR primers (P1, P2, and P3) arelabeled with Cy3, Cy5, and biotin, respectively.

High throughput sequencing: After the SNP-specific ligation-extensionreaction, and amplification of the products, readout of the SNP typescan be done using high throughput sequencing as described in Margulieset al. Nature 437 376-380, (2005). Briefly, the amplicons are dilutedand mixed with beads such that each bead captures a single molecule ofthe amplified material. The DNA-carrying beads are isolated in separate100 um aqueous droplets made through the creation of aPCR-reaction-mixture-in-oil emulsion. The DNA molecule on each bead isthen amplified to generate millions of copies of the sequence, which allremain bound to the bead. Finally, the beads are placed into a highlyparallel sequencing-by-synthesis machine which can generate over 400,000sequence reads (˜100 bp per read) in a single 4 hour run.

Fetal Diagnosis: The SNP data obtained from the high throughputsequencing is analyzed for fetal diagnosis using the methods describedin Example 9.

Example 7 Use of Highly Parallel Genotyping and Bead Arrays for FetalDiagnosis

Fetal cells or nuclei can be isolated as described in the enrichmentsection or as described in example 1. The enrichment process describedin example 1 may generate a final mixture containing approximately 500maternal white blood cells (WBCs), approximately 100 maternal nuclearred blood cells (mnBCs), and a minimum of approximately 10 fetalnucleated red blood cells (fnRBCs) starting from an initial 20 ml bloodsample taken late in the first trimester. In the context of fetaldiagnosis, it is very valuable to have a reference sample containingonly the mother's genotype. When the diagnosis procedure is based onenriching for circulating fetal cells in the mother's blood, thereference sample can be created simply by not enriching for fetal cells,and then diluting enough to ensure that <<1 fetal cell is expected inthe sample used as input to the SNP detection process. Alternatively,white blood cells can be selected, for which the circulating fetalfraction is negligible.

Perform Linear Amplification of Genomic DNA: Current technologies andprotocols for highly parallel SNP detection with DNA microarray readoutresult in inaccurate calls when there are too few starting DNA copies orwhen a particular allele represents a small fraction in the populationof input DNA molecules. In the methods described herein aratio-preserving pre-amplification of the DNA, such as linearamplification of genomic DNA, is done to provide enough copies tosupport accurate SNP detection via primer extension ligation methodsdescribed below. This pre-amplification method is chosen to produce asclose as possible the same amplification factor for all target regionsof the genome.

Linear amplification protocols can be performed as described in Liu etal. BMC Genomics 4(1) 19-30 (2003). This protocol uses a terminaltransferase tailing step and second strand synthesis to incorporate T7promoters at the ends of the DNA fragments prior to in vitrotranscription (IVT). Briefly, genomic DNA can be obtained either by ChIPor by restriction digests. ChIP DNA is fragmented by sonication andisolated using antibody against di-methyl-H3 K4. Restricted genomic DNAis prepared as follows: genomic DNA isolated by bead lysis,phenol/chloroform extraction, and ethanol precipitation, is restrictedeither with Alu 1 or with Rsa 1 (New England BioLabs (NEB)). Digestedproducts then undergo electrophoresis on a 2% agarose gel. Restrictionfragments in the 100-700 by size range are excised from the gel andpurified using the QIAquick Gel Extraction Kit (Qiagen).

Calf intestinal alkaline phosphatase (CIP) (NEB) is used to remove 3′phosphate groups from DNA samples prior to IVT. Up to 500 ng DNA isincubated with 2.5 U enzyme in a 10 μl volume with the supplied bufferat 37° C. for 1 hour. The reaction was cleaned up with the MinEluteReaction Cleanup Kit (Qiagen) per manufacturer instructions except thatthe elution volume is increased to 20 μl.

PolyT tails are generated using terminal transferase (TdT) as follows.Up to 50 ng of CIP-treated template DNA is incubated for 20 minutes at37° C. in a 10 μl solution containing 20 U TdT (NEB), 0.2 M potassiumcacodylate, 25 mM HCI pH 6.6, 0.25 mg/ml BSA, 0.75 mM CoCl₂, 4.6 μM dTTPand 0.4 μM ddCTP. The reaction is halted by the addition of 2 μl of 0.5M EDTA pH 8.0, and product isolated with the MinElute Reaction CleanupKit (Qiagen), increasing the elution volume to 20 μl.

Second strand synthesis and incorporation of the T7 promoter sequence iscarried out as follows: the 20 μl tailing reaction product is mixed with0.6 μl of 25 μM T7-A18B primer(5′-CATTAGCGGCCGCGAAATTAATACGACTCACTATAGGGAG(A)18 [B], where B refers toC, G or T), 5 μl 10X EcoPol buffer (100 mM Tris-HCl pH 7.5, 50 mM MgCl2,75 mM dTT), 2 μl 5.0 mM dNTPs, and 20.4 μl nuclease-free water. Inexperiments with 10-50 ng starting material, the end primerconcentration is kept at 300 nM, while the reaction volume is scaleddown to maintain an end concentration of 1 ng/ul starting material. Forstarting amounts less than 10 μl, the volume is kept at 10 μl. Ifnecessary, volume reduction of the eluate from the TdT tailing isperformed in a vacuum centrifuge on medium heat. Samples are incubatedat 94° C. for 2 minutes to denature, ramped down at −1 C.°/sec to 35°C., held at 35° C. for 2 minutes to anneal, ramped down at −0.5 C.°/secto 25° C. and held while Klenow enzyme is added (NEB) to an endconcentration of 0.2 U/μl. The sample is then incubated at 37° C. for 90minutes for extension. The reaction is halted by addition of 5 μL 0.5 MEDTA pH 8.0 and product is isolated with the MinElute Reaction CleanupKit (Qiagen), increasing the elution volume to 20 μL.

Prior to in vitro transcription, samples are concentrated in a vacuumcentrifuge at medium heat to 8 μl volume. The in vitro transcription isperformed with the T7 Megascript Kit (Ambion) per manufacturer'sinstructions, except that the 37° C. incubation is increased to 16hours. The samples are purified with the RNeasy Mini Kit (Qiagen) permanufacturer's RNA cleanup protocol, except with an additional 500 μLwash with buffer RPE. RNA is quantified by absorbance at 260 nm, andvisualized on a denaturing 1.25×MOPS-EDTA-Sodium Acetate gel.

Highly Parallel Genotyping: Highly parallel SNP detection can be used toobtain information about genotype and gene copy numbers at a largenumber of loci scattered across the genome, in one procedure. Highlyparallel SNP genotyping can be performed as described in Example 6.

Bead Array: After the SNP-specific ligation-extension reaction, andamplification of the products, readout of the SNP types can be doneusing bead arrays as described in Shen at al. Mutation Research57370-82, (2005). Double-stranded PCR products are immobilized ontoparamagnetic particles by adding 20 μl of Paramagnetic Particle BReagent (MPB; Illumina) to each 60-μl PCR, and incubated at roomtemperature for a minimum of 60 minutes. The bound PCR products arewashed with universal buffer 2 (UB2; Illumina), and denatured by adding30 μl of 0.1 N NaOH. After one minute at room temperature, 25 μl of thereleased ssDNAs is neutralized with 25 μl of hybridization reagent (MHI: Illumina) and hybridized to arrays.

Arrays are hydrated in UB2 for 3 minutes at room temperature, and thenpreconditioned in 0.1 N NaOH for 30 seconds. Arrays are returned to theUB2-reagent for at least 1 minute to neutralize the NaOH. The pretreatedarrays are exposed to the labeled ssDNA samples described above.Hybridization is conducted under a temperature gradient program from 60°C. to 45° C. over-12 hours. The hybridization is held at 45° C. untilthe array is processed. After hybridization, the arrays are first rinsedtwice in UB2 and once with IS 1 (IS 1; Illumina) at room temperaturewith mild agitation, and then imaged at a resolution of 0.8 micronsusing a BeadArray Reader (Illumina). PMT settings are optimized fordynamic range, channel balance, and signal-to-noise ratio. Cy3 and Cy5dyes are excited by lasers emitting at 532 nm and 635 nm, respectively.

The automatic calling of genotypes is performed by genotype callingsoftware (GenCall) genotyping software, using a Bayesian model, whichcompared intensities between probes for allele A and allele B across alarge number of samples to create archetypal clustering patterns. Thesepatterns allowed the genotyping data to be assigned membership toclusters using a probabilistic model and allowed assignment of acorresponding GenCall score. For example, data points falling betweentwo clusters are assigned a low probability score of being a member ofeither cluster and had a correspondingly low GenCall score. The clusterquality can be assessed by evaluating the CSS, a measure of statisticalseparation between clusters. It is defined as

${CSS} = {{\min \left( {\frac{{\theta_{AB} - \theta_{AA}}}{{\sigma_{AB} + \sigma_{AA}}} \cdot \frac{{\theta_{AB} - \theta_{BB}}}{{\sigma_{AB} + \sigma_{BB}}}} \right)}.}$

Loci with cluster scores around the cutoff of 3.0 are visually evaluatedand the training clusters refined by manual intervention. A cutoff valueof 3.0 can be chosen for the CSS on the basis of minimizing strandconcordance errors. Loci with questionable clusters are scored asunsuccessful and excluded from further analysis:

Fetal Diagnosis: The SNP data obtained from the bead array assay isanalyzed for fetal diagnosis using the methods described in Example 9.

Example 8 Use of Highly Parallel Genotyping and DNA Arrays for FetalDiagnosis

Fetal cells or nuclei can be isolated as described in the enrichmentsection or as described in example 1. The enrichment process describedin example 1 may generate a final mixture containing approximately 500maternal white blood cells (WBCs), approximately 100 maternal nuclearred blood cells (mnBCs), and a minimum of approximately 10 fetalnucleated red blood cells (fnRBCs) starting from an initial 20 ml bloodsample taken late in the first trimester. In the context of fetaldiagnosis, it is very valuable to have a reference sample containingonly the mother's genotype. When the diagnosis procedure is based onenriching for circulating fetal cells in the mother's blood, thereference sample can be created simply by not enriching for fetal cells,and then diluting enough to ensure that <<1 fetal cell is expected inthe sample used as input to the SNP detection process. Alternatively,white blood cells can be selected, for which the circulating fetalfraction is negligible.

Perform Multiple Displacement Amplification: Current technologies andprotocols for highly parallel SNP detection with DNA microarray readoutresult in inaccurate calls when there are too few starting DNA copies orwhen a particular allele represents a small fraction in the populationof input DNA molecules. In the methods described herein aratio-preserving pre-amplification of the DNA, such as multipledisplacement amplification, is done to provide enough copies to supportaccurate SNP detection via primer extension ligation methods describedbelow. This pre-amplification method is chosen to produce as close aspossible the same amplification, factor for all target regions of thegenome. Multiple displacement amplification protocols can be performedas described in Example 6.

Highly Parallel Genotyping: Highly parallel SNP detection can be used toobtain information about genotype and gene copy numbers at a largenumber of loci scattered across the genome, in one procedure. Highlyparallel SNP genotyping can be performed as described in Example 6.

DNA Array: After the SNP-specific ligation-extension reaction, andamplification of the products, readout of the SNP types can be doneusing DNA arrays as described in Gunderson et al. Nature Genetics 37(5)549-554, (2005). The array data can be obtained using Illumina's SentrixBeadArray matrix. Oligonucleotide probes on the beads are 75 bases inlength; 25 bases at the 5′ end are used for decoding and the remaining50 bases are locus-specific. The oligonucleotides are immobilized onactivated beads using a 5′ amino group. The array can contain probes forSNP assays (probe pairs, allele A and allele B).

The amplification products of the SNP-specific ligation-extensionreaction are denatured at 95° C. for 5 min and then exposed it to theSentrix array matrix, which is mated to a microtiter plate, submergingthe fiber bundles in 15 ml of hybridization sample. The entire assemblyis incubated for 14-18 h at 48° C. with shaking. After hybridization,arrays are washed in 1× hybridization buffer and 20% formamide at 48 1 Cfor 5 min.

Allele Specific Primer Extension (ASPE) can be used to score SNPs.Before carrying out the array-based primer extension reaction, Sentrixarray matrices are washed for 1 min with wash buffer (33.3 mM NaCl, 3.3mM potassium phosphate and 0.1% Tween-20, pH 7.6) and then incubated for15 min in 50 μl of ASPE reaction buffer (Illumina EMM, containingpolymerase, a mix of biotin-labeled and unlabeled nucleotides,single-stranded binding protein, bovine serum albumin and appropriatebuffers and salts) at 37° C. After the reaction, the arrays areimmediately stripped in freshly prepared 0.1 N NaOH for 2 min and thenwashed and neutralized twice in 1× hybridization buffer for 30 s. Thebiotin-labeled nucleotides incorporated during primer extension using asandwich assay is then detected as described in Pinkel et al. PNAS 83(1986) 2934-2938. The arrays are blocked at room temperature for 10 minin 1 mg ml⁻¹ bovine serum albumin in 1× hybridization buffer and thenwashed for 1 min in 1× hybridization buffer. The arrays are then stainedwith streptavidin-phycoerythrin solution (1× hybridization buffer, 3 μgml⁻¹ streptavidin-phycoerythrin (Molecular Probes) and 1 mg ml⁻¹ bovineserum albumin) for 10 min at room temperature. The arrays are washedwith 1× hybridization buffer for 1 min and then counterstained them withan antibody reagent (10 mg ml⁻¹ biotinylated antibody to streptavidin(Vector Labs) in 1×PBST (137 mM NaCl, 2.7 mM KCl, 4.3 mM sodiumphosphate, 1.4 mM potassium phosphate and 0.1% Tween-20) supplementedwith 6 mg ml⁻¹ goat normal serum) for 20 min. After counterstaining, thearrays are washed in 1× hybridization buffer and restained them withstreptavidin-phycoerythrin solution for 10 min. The arrays are washedone final time in 1× hybridization buffer before imaging them in 1×hybridization buffer on a custom CCD-based BeadArray imaging system. Theintensities are extracted intensities using custom image analysissoftware.

The automatic calling of genotypes is performed by genotype callingsoftware (GenCall) genotyping software as described in example 7.

Fetal Diagnosis: The SNP data obtained from the DNA array assay isanalyzed for fetal diagnosis using the methods described in Example 9.

Example 9 Fetal Diagnosis

Results obtained in Example 6, 7, and 8 can be used for fetal diagnosis.

A model for SNP data in the context of fetal diagnosis is given inEquations 1-3. A normal (diploid) fetus will result in data xk at locusk

x _(k) =A _(k)[(m _(k1) +m _(k2))+f((m _(k1) or m _(k2))+p_(k))]+residual  (1)

A trisomy caused by maternal non-dysjunction is represented by

x _(k) =A _(k)[(m _(k1) +m _(k2))+f(m _(k1) +m _(k2) +p_(k))]+residual  (2)

and a paternally inherited trisomy is represented by

x _(k) =A _(k)[(m _(k1) +m _(k2))+f((m _(k1) or m _(k2))+p _(k1) +p_(k2))]+residual  (3)

In Equations 1-3, A_(k) denotes a scale factor which subsumes theefficiencies of amplification, hybridization, and readout common to thealleles at locus k. In this model amplification differences betweendifferent primer pairs are fitted and do not appear in the residuals.Alternatively, a single A parameter could be used and the residualswould reflect these differences. f represents the fraction of fetalcells in the mixture, m_(k1) and m_(k2) denote the maternal alleles atlocus k, and p_(k) denotes the paternal allele at locus k. The allelesymbols actually represent unit data contributions that can bearithmetically summed; e.g., m_(k1) might be a detection of the ‘C’genotype represented by unit contribution to the ‘C’ bin at that locus.

FIG. 6 illustrates the kinds of SNP calls that result under this datamodel. At Locus 1, the fetal genotype was GC. There is a paternallyinherited ‘G’ allele contribution in the mixed sample that results in anincrease of G signal above the noise level observed in the maternal-onlysample, and a maternally inherited ‘C’ allele contribution thatincreases the C signal. The effective value off that has been assumed inthese illustrations is f=0.2. At Locus 2, the paternal allele is ‘T’. AtLocus 3, the fetus is homozygous GG. In the third row of FIG. 6, theeffect of a fetal trisomy is represented by the dashed red lines,superposed on a normal (diploid) mixed-sample pattern. The trisomy isassumed to include Loci t and 2, but not Loci 3 and 4. At Loci 1 and 2both maternal allele strengths are increased in the mixed sample, aswell as the separate paternal allele contribution. At Locus 3, it wasassumed that the fetus was ‘GG’ and the paternal allele is the same asthe first maternal allele. Note that the ratio between the average ofthe two maternal alleles and the paternal allele will be slightlygreater at Loci 1 and 2 than at Locus 4—this is one indicator oftrisomy.

Simple, Suboptimal Detection Methods

A simple intuitive understanding of the effect of trisomy is that itincreases the abundances of fetal alleles at loci within the affectedregion. Trisomies are predominately from maternal non-disjunctionevents, so typically both maternal alleles, and a single paternalallele, are increased, and the ratio of maternal allele abundance topaternal allele abundance is higher in the trisomic region. Thesesignatures may be masked by differences in DNA amplification andhybridization efficiency from locus to locus, and from allele to allele.

Within a locus, the PCR differences are smaller than between loci,because the same primers are responsible for all the different alleleamplicons at that locus. Therefore, the allele ratios may be more stablethan the overall allele abundances. This can be exploited by identifyingloci where the paternal allele is distinct form the maternal allele andtaking the ratio of the paternal allele strength to the average of thematernal allele strengths. These allele ratios then can be averaged overthe hypothesized aneuploidy region and compared to the average over acontrol region. The distributions of these ratio values in thehypothesized aneuploidy region and in the control region can be comparedto create an estimate of statistical significance for the observeddifference in means. A simple example of this procedure would useStudent's t-test.

Alternatively, the maternal allele strengths over the suspectedaneuploid region can be compared to those in the control region, allwithout forming any ratios to paternal alleles. In this approach, errorsin the measurement of the paternal allele abundances do not enter;however, the differences in amplification efficiency between primerpairs do enter, and these typically will be larger than differencesbetween alleles in the same locus. In this approach there also may be aresidual bias between the efficiencies averaged over certainchromosomes; therefore it may be useful to perform the entire detectionprocess resulting in an observed abundance ratio for the mixed sample,do it also for the maternal sample, and then take the ratio of ratios.This ratio of ratios will be free of the chromosome bias; however, itwill include errors in the measurements of the maternal sample.

Because the fraction of fetal cells can be small or even zero, theaneuploidy signal (the departure of the observed ratio from unity) maybe weak even when fetal aneuploidy is present. An independent estimateof the fetal cell fraction, including a confidence estimate of whethermeasurable fetal DNA is present at all, is useful in interpreting theobserved aneuploidy ratios. FIG. 7 illustrates allele signals re-orderedby rank. Assuming the mother has no more than two alleles at each locus,the magnitude of the third ranked allele is potentially a robustindicator of the presence of fetal DNA. Although measurement errors canartificially inflate the size of the third and fourth alleles, it isvery unlikely to result in a bimodal distribution for the relativemagnitude of the third allele with respect to the first two. Such abimodal distribution is cartooned in FIG. 8. The secondary peak of thisdistribution occurs at a value approximately equal to the fraction offetal cells. This is one way to determine the value of the variable f inthe data model. The statistical confidence that the bimodality is realcan be used to assign a confidence that fetal DNA was present in themixed sample. Statistical tests for bimodality are discussed in MY Chengand P Hall, J. R. Statist. Soc. B (1998), 60 (Part 3) pp 579-589, andthese authors prefer bootstrap based methods. Only if this confidenceexceeds a threshold, say 99.9%, would an aneuploidy call be attempted.This threshold needs to be quite stringent to avoid the expensivemistake of declaring a fetus normal when in fact it is not. Theestimated fetal cell fraction can be used to interpret the aneuploidystatistic: a large value off and an observed aneuploidy ratio very closeto unity would suggest no aneuploidy; a small value of f along with ananeuploidy ratio approximately equal to 1+f/2 would suggest trisomy, butit is still necessary to decide whether the observed aneuploidy issignificantly different from unity and this requires an error model. Asimple robust estimate of the error distribution could come fromrepeated processing of nominally identical samples.

Fitting of Data to the Model for Optimal Detection of Aneuploidy

The data model can be used to simultaneously recover estimates of thefraction of fetal cells, and efficient detection of aneuploidies inhypothesized chromosomes or chromosomal segments. This integratedapproach should result in more reliable and sensitive declarations ofaneuploidy.

Equations 1-3 actually represent five different models because of theambiguity between m_(k1) and m_(k1) in the last term of Equations 1 and3. Testing for aneupoidy of Chromosomes 13, 18, and 21 then would entail5×5×5=125 different model variants that would be fit to the data.

The parameter values for the maternal allele identities are taken fromthe results for the maternal-only sample and the remaining parametersare fit to the data from the mixed sample. Because the number ofparameters is very large when the number of loci is large, a globaloptimization requires iterative search techniques. One possible approachis to do the following for each model variant

i) Set f to 0 and solve for A_(k) at each locus.ii) Set f to a value equal to the smallest fetal/maternal cell ratio forwhich fetal cells are likely to be detectable.iii) Solve for paternal allele(s) identities and strengths at eachlocus, one locus at a time, that minimize data-model residuals.iv) Fix the paternal alleles and adjust f to minimize residuals over allthe data.v) Now vary only the A_(k) to minimize residuals. Repeat iv and v untilconvergence.vi) Repeat iii through v until convergence.

The best overall fit of model to data is selected from among all themodel variants. The best overall fit yields the values of f and A_(k) wewill call f_(max), A_(kmax). The likelihood of observing the data givenf_(max) can be compared to the likelihood given f=0. The ratio is ameasure of the amount of evidence for fetal DNA. A typical threshold fordeclaring fetal DNA would be a likelihood ratio of˜1000 or more. Thelikelihood calculation can be approximated by a more familiarChi-squared calculation involving the sum of squared residuals betweenthe data and the model, where each residual is normalized by theexpected rms error. This Chi-squared is a good approximation to theLog(likelihood) to the extent the expected errors in the data areGaussian additive errors, or can be made so by some amplitudetransformation of the data.

If based on the above determination of likelihood ratio it is decidedthat fetal DNA is not present, then the test is declared to benon-informative. If it is decided that fetal DNA is present, then thelikelihoods of the data given the different data model types can becompared to declare aneuploidy. The likelihood ratios of aneuploidmodels (Equations 2 and 3) to the normal model (Equation 1) arecalculated and these ratios are compared to a predefined threshold.Typically this threshold would be set so that in controlled tests allthe trisomic cases would be declared aneuploid, and so that it would beexpected that the vast majority (>99.9%) of all truly trisomic caseswould be declared aneuploid by the test. Given a limited patient cohortsize for test validation, one strategy to accomplish approximately the99.9% detection rate is to increase the likelihood ratio thresholdbeyond that necessary to declare all the known trisomic cases in thevalidation set by a factor of 1000/N, where N is the number of trisomycases in the validation set.

Error Modeling

The data contain small additive errors associated with the readouttechnology, multiplicative errors associated with DNA amplification andhybridization efficiencies being different from locus to locus and fromallele to allele within a locus, and errors associated with imperfectspecificity in the process. By including the many parameters A_(k) inthe model, rather than a single scale parameter, the residuals willinclude allele-to-allele efficiency differences but not locus to locusdifferences. These tend to be multiplicative errors in the resultingobserved allele strengths heights; i.e. two signals may be 20% differentin strength although the starting concentrations of the alleles wereidentical. As a first approximation we can assume errors are random fromallele to allele, and have relatively small additive errors, and largerPoisson and multiplicative error components. The magnitudes of theseerror components can be estimated from repeated processing of identicalsamples. The Chi-square residuals calculation for any data-model fitthen can be supported with these modeled squared errors for any peakheight or data bin.

Alternatively, when using a single A parameter, the residuals will belarger and will contain a component which is correlated between allelesat the same locus. Calculation of likelihood will need to take thiscorrelation into account.

1. A method for detecting fetal abnormality comprising: determining aratio of abundance of maternal allele(s) to abundance of paternalallele(s) in genomic DNA from fetal cells enriched from a maternal bloodsample using size-based separation.